weixin_39706561
weixin_39706561
2020-12-09 03:56

Set "Remove" operation removes all tags repeatedly

Currently, calling set.Rmv(key) runs an element prefix query for the given key, and tombstones all the results it gets. This means a full element removal, every time a Rmv() operation is run. This results in redundant tombstones ending up in the Delta. Should it not only tombstone the elements that haven't yet been tombstoned.

Example.


set.Add("key", 1)     #produces delta with ID: Qa
set.Add("key", 2)     #produces delta with ID: Qb
set.Rmv("key")        # produces delta tombstoning Qa, and Qb
set.Add("key", 3)     # produces delta with ID: Qc
set.Rmv("key")        # produces delta tombstoning Qa, Qb, and Qc

(Assume there is a merge operation called between each action) In the example above, Qa and Qb were tombstoned twice (Once for each Rmv() call). Additionally, this problem only gets worse if you were to make several hundred updates to a key, followed by removals. Given that the entire BlockID is included in a tombstone Delta, this inflates the storage requirements exponentially.

Given that each Merge call updates the local state key value store, the record of which keys have been tombstoned is already available. In the case where we are syncing from a remote store, we already sync the Merkle Clock DAG and its deltas as well, so again we have a consistent state of which keys have been tombstoned when. In either of these cases, do we not already have the necessary information to only add tombstones of elements which havent actually been tombstoned yet?

Is this an intended design for the Merkle CRDT semantics of an ORSet, or just taken verbatim from the original design from the delta-state paper.

该提问来源于开源项目:ipfs/go-ds-crdt

  • 点赞
  • 写回答
  • 关注问题
  • 收藏
  • 复制链接分享
  • 邀请回答

6条回答

  • weixin_39548740 weixin_39548740 5月前

    Thanks for looking into this! I think your intuitions are correct and this is an optimization we can probably do.

    I want to sleep over it, but my thinking now is that the order imposed by the DAG allows to do this without consequences, even when multiple branches are present, and when they are being synced in parallel (something that does not happen now but that I would like to address some day).

    As mentioned, this should improve delta-sizes for scenarios where a key has a large number of tombstones because it is being added and removed repeteadly.

    点赞 评论 复制链接分享
  • weixin_39706561 weixin_39706561 5月前

    Thanks for the quick feedback. I'll dig into this more, just wanted to make sure I wasn't missing something. From what I can tell, and obviously, since I created the issue, we should be able to do away with the duplicate tombstonining without issue. The only issue would be #23 but that is independent of this, and more of a global issue across the entire design.

    I'll follow up after I do some tests with this optimization.

    点赞 评论 复制链接分享
  • weixin_39548740 weixin_39548740 5月前

    Yes, I was thinking of #23 too and same thought.

    点赞 评论 复制链接分享
  • weixin_39548740 weixin_39548740 5月前

    do you have patch you want to submit for this?

    点赞 评论 复制链接分享
  • weixin_39706561 weixin_39706561 5月前

    I do! Caught up with some other work stuff but can open a PR

    点赞 评论 复制链接分享
  • weixin_39548740 weixin_39548740 5月前

    Fixed in #48 . Thanks very much for the contribution!

    点赞 评论 复制链接分享

相关推荐