dongxiong4571 2015-05-28 10:03
浏览 52
已采纳

带有光标的Google App Engine数据存储区查询不会迭代所有项

In my application I have a datastore query with a filter, such as:

datastore.NewQuery("sometype").Filter("SomeField<", 10)

I'm using a cursor to iterate batches of the result (e.g in different tasks). If the value of SomeField is changed while iterating over it, the cursor will no longer work on google app engine (works fine on devappserver).

I have a test project here: https://github.com/fredr/appenginetest In my test I ran /db that will setup the db with 10 items with their values set to 0, then ran /run/2 that will iterate over all items where the value is less than 2, in batches of 5, and update the value of each item to 2.

The result on my local devappserver (all items are updated): devappserver result

The result on appengine (only five items are updated): appengine result

Am I doing something wrong? Is this a bug? Or is this the expected result? In the documentation it states:

Cursors don't always work as expected with a query that uses an inequality filter or a sort order on a property with multiple values.

  • 写回答

1条回答 默认 最新

  • dongshuan8722 2015-05-28 11:35
    关注

    The problem is the nature and implementation of the cursors. The cursor contains the key of the last processed entity (encoded), and so if you set a cursor to your query before executing it, the Datastore will jump to the entity specified by the key encoded in the cursor, and will start listing entities from that point.

    Let's examine your case

    Your query filter is Value<2. You iterate over the entities of the query result, and you change (and save) the Value property to 2. Note that Value=2 does not satisfy the filter Value<2.

    In the next iteration (next batch) a cursor is present which you apply properly. Therefore when the Datastore executes the query, it jumps to the last entity processed in the previous iteration, and wants to list entities that come after this. But the entity pointed by the cursor may already not satisfy the filter; because the index entry for its new Value 2 will most likely be already updated (non-deterministic behavior - see eventual consistency for more details which applies here because you did not use an Ancestor query which would guarantee strongly consistent results; the time.Sleep() delay just increases the probability of this).

    So the Datastore sees that the last processed entity does not satisfy the filter and will not search all the entities again but report that no more entities are matching the filter, hence no more entities will be updated (and no errors wil be reported).

    Suggestion: don't use cursors and filter or sort by the same property you're updating at the same time.

    By the way:

    The part from the Appengine docs you quoted:

    Cursors don't always work as expected with a query that uses an inequality filter or a sort order on a property with multiple values.

    This is not what you think. This means: cursors may not work properly on a property which has multiple values AND the same property is either included in an inequality filter or is used to sort the results by.

    By the way #2

    In the screenshot you are using SDK 1.9.17. The latest SDK version is 1.9.21. You should update it and always use the latest available version.

    Alternatives to achieve your goal

    1) Don't use cursors

    If you have many records, you won't be able to update all your entities in one step (in one loop), but let's say you update 300 entities. If you repeat the query, the already updated entities will not be in the results of executing the same query again because the updated Value=2 does not satisfy the filter Value<2. Just redo the query+update until the query has no results. Since your change is idempotent, it would not cause any harm if the update of the index entry of an entity is delayed and would get returned by the query multiple times. It would be best to delay the execution of the next query to minimize the chance of this (e.g. wait a few seconds between redoing the query).

    Pros: Simple. You already have the solution, just exclude the cursor handling part.

    Cons: Some entities might get updated multiple times (therefore the change must be idempotent). Also the change performed on entities must be something which will exclude the entity from the next query.

    2) Using Task Queue

    You could first execute a keys-only query and defer the update to using tasks. You could create tasks with let's say passing 100 keys to each, and the tasks could load the entities by key and do the update. This would ensure each entity would only get updated once. This solution would have a little bigger delay due to involving the task queue, but that is not a problem in most cases.

    Pros: No duplicated updates (therefore change may be non-idempotent). Works even if the change to be performed would not exclude the entity from the next query (more general).

    Cons: Higher complexity. Bigger lag/delay.

    3) Using Map-Reduce

    You could use the map-reduce framework/utility to do massively parallel processing of many entities. Not sure if it has been implemented in Go.

    Pros: Parallel execution, can handle even millions or billions of entities. Much faster in case of large entity number. Plus pros listed at 2) Using Task Queue.

    Cons: Higher complexity. Might not be available in Go yet.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 请问一下这个运行结果是怎么来的
  • ¥15 这个复选框什么作用?
  • ¥15 单通道放大电路的工作原理
  • ¥30 YOLO检测微调结果p为1
  • ¥20 求快手直播间榜单匿名采集ID用户名简单能学会的
  • ¥15 DS18B20内部ADC模数转换器
  • ¥15 做个有关计算的小程序
  • ¥15 MPI读取tif文件无法正常给各进程分配路径
  • ¥15 如何用MATLAB实现以下三个公式(有相互嵌套)
  • ¥30 关于#算法#的问题:运用EViews第九版本进行一系列计量经济学的时间数列数据回归分析预测问题 求各位帮我解答一下