weixin_39963465
weixin_39963465
2020-11-27 17:05

Removed proposer indices update-on-miss and added a test

Credit to for the test.

raised a concern on proposer indices in cache could get mixed up due to reorgs. This is not a concern as proposer indices gets filled on every epoch transition. Relevant code paths 👇

https://github.com/prysmaticlabs/prysm/blob/master/beacon-chain/blockchain/process_block.go#L328 https://github.com/prysmaticlabs/prysm/blob/master/beacon-chain/core/helpers/committee.go#L364 https://github.com/prysmaticlabs/prysm/blob/master/beacon-chain/cache/committee.go#L127

The test did not call the correct methods onBlock or ReceiveBlock. Those methods are the ones that refill the proposer indices cache. With that said, I have removed update-on-miss in the helper function BeaconProposerIndex so it's consistent with how we update the proposer indices cache. The test passes after

该提问来源于开源项目:prysmaticlabs/prysm

  • 点赞
  • 写回答
  • 关注问题
  • 收藏
  • 复制链接分享
  • 邀请回答

8条回答

  • weixin_39551188 weixin_39551188 5月前

    What if it happens concurrently? Then you have a cache filled for a given seed (which really lags one epoch behind), with proposer indices based on the effective balances of the filler. While the reader may use the same cache key, but with different effective balances, and thus get the wrong proposer index.

    点赞 评论 复制链接分享
  • weixin_39551188 weixin_39551188 5月前

    The cache key and lock on the cache by itself is not enough to entirely avoid concurrent functions from mixing up proposer indices. You would need a lock around the whole process of filling, using it, and emptying it if you don't change the cache key to include the effective balances condition it was filled for. Or not share the cache between different chains, creating a new one for each.

    点赞 评论 复制链接分享
  • weixin_39963465 weixin_39963465 5月前

    The cache key and lock on the cache by itself is not enough to entirely avoid concurrent functions from mixing up proposer indices. You would need a lock around the whole process of filling, using it, and emptying it if you don't change the cache key to include the effective balances condition it was filled for. Or not share the cache between different chains, creating a new one for each.

    I think you are right. That does feel safer. I just added a lock around the usages of the cache. Will test this run time to ensure no dead lock

    点赞 评论 复制链接分享
  • weixin_39551188 weixin_39551188 5月前

    I think there's some misunderstanding here. Locking around all steps as one process was what I meant, not each individually, and it was an exaggeration to point out that the cache itself is borked.

    Imagine this, two "users" (go routines using the cache):

    1. user A and B start from the middle of the same epoch, same chain
    2. user A transitions into next epoch
    3. user A fills cache (cache locks and unlocks, is updated)
    4. work of user A is parked somewhere
    5. user B processes some block before epoch transition. It decreased the balance of a validator that was going to get selected as proposer.
    6. user B transitions into next epoch. Epoch transition updates effective balance of that one validator. If there was no cache, and just following spec, the following happens: Due to the effective balance change, it can't get selected as proposer: the selection logic skips over the proposer.
    7. user B tries to fill cache, but key already exists, it's the same as on the other chain (the committee seed is determined one epoch ahead after all). So cache is not updated with the different proposer for user B perspective of effective balances.
    8. user B tries to determine the proposer index from the cache, and reads the wrong index: that cached by user A.
    9. and if it were different, and user B somehow got the cache changed to its own perspective, then the user A would have problems, and see the perspective of B.

    TLDR: proposers for next epoch can change on short notice due to effective balances. If the state of effective balances is not included in the cache key, things get messed up. Yes, it used to be different, about a year ago if I remember correctly, when effective balances were not considered for proposer selection.

    The solution is to scope the use of a proposers cache to just that of a single consistent chain, or fix the cache keys to make it work as a global cache.

    Spec code of proposer function:

    python
    def compute_proposer_index(state: BeaconState, indices: Sequence[ValidatorIndex], seed: Bytes32) -> ValidatorIndex:
        """
        Return from ``indices`` a random index sampled by effective balance.
        """
        assert len(indices) > 0
        MAX_RANDOM_BYTE = 2**8 - 1
        i = uint64(0)
        total = uint64(len(indices))
        while True:
            candidate_index = indices[compute_shuffled_index(i % total, total, seed)]
            random_byte = hash(seed + uint_to_bytes(uint64(i // 32)))[i % 32]
            effective_balance = state.validators[candidate_index].effective_balance
            if effective_balance * MAX_RANDOM_BYTE >= MAX_EFFECTIVE_BALANCE * random_byte:
                return candidate_index
            i += 1
    

    Note how changes in effective balances potentially affect the proposer selection. This can change right in the epoch transition before the epoch of the proposer. Whereas the proposers seed is known an epoch in advance, like the committee seed, because it is based on an older randao mix.

    点赞 评论 复制链接分享
  • weixin_39963465 weixin_39963465 5月前

    I think there's some misunderstanding here. Locking around all steps as one process was what I meant, not each individually, and it was an exaggeration to point out that the cache itself is borked.

    Imagine this, two "users" (go routines using the cache):

    1. user A and B start from the middle of the same epoch, same chain
    2. user A transitions into next epoch
    3. user A fills cache (cache locks and unlocks, is updated)
    4. work of user A is parked somewhere
    5. user B processes some block before epoch transition. It decreased the balance of a validator that was going to get selected as proposer.
    6. user B transitions into next epoch. Epoch transition updates effective balance of that one validator. If there was no cache, and just following spec, the following happens: Due to the effective balance change, it can't get selected as proposer: the selection logic skips over the proposer.
    7. user B tries to fill cache, but key already exists, it's the same as on the other chain (the committee seed is determined one epoch ahead after all). So cache is not updated with the different proposer for user B perspective of effective balances.
    8. user B tries to determine the proposer index from the cache, and reads the wrong index: that cached by user A.
    9. and if it were different, and user B somehow got the cache changed to its own perspective, then the user A would have problems, and see the perspective of B.

    TLDR: proposers for next epoch can change on short notice due to effective balances. If the state of effective balances is not included in the cache key, things get messed up. Yes, it used to be different, about a year ago if I remember correctly, when effective balances were not considered for proposer selection.

    The solution is to scope the use of a proposers cache to just that of a single consistent chain, or fix the cache keys to make it work as a global cache.

    Spec code of proposer function:

    python
    def compute_proposer_index(state: BeaconState, indices: Sequence[ValidatorIndex], seed: Bytes32) -> ValidatorIndex:
        """
        Return from <code>indices a random index sampled by effective balance.
        """
        assert len(indices) > 0
        MAX_RANDOM_BYTE = 2**8 - 1
        i = uint64(0)
        total = uint64(len(indices))
        while True:
            candidate_index = indices[compute_shuffled_index(i % total, total, seed)]
            random_byte = hash(seed + uint_to_bytes(uint64(i // 32)))[i % 32]
            effective_balance = state.validators[candidate_index].effective_balance
            if effective_balance * MAX_RANDOM_BYTE >= MAX_EFFECTIVE_BALANCE * random_byte:
                return candidate_index
            i += 1
    

    Note how changes in effective balances potentially affect the proposer selection. This can change right in the epoch transition before the epoch of the proposer. Whereas the proposers seed is known an epoch in advance, like the committee seed, because it is based on an older randao mix.

    I see. We currently get B to change the cache so A will have problem. I'll try to append get_total_balance(state: BeaconState, indices: Set[ValidatorIndex]) on to the key and see if we can avoid to endure too many penalties

    点赞 评论 复制链接分享
  • weixin_39551188 weixin_39551188 5月前

    And then what if one validator decreases by 2 ETH, and another increases by 2 ETH. That key won't work either. I recommend taking a break from hotfixes and going through this carefully, maybe later if that helps. (yes, it's my Friday too, and I am just as tired).

    点赞 评论 复制链接分享
  • weixin_39963465 weixin_39963465 5月前

    Thanks , will close this to favor an issue to outline the current issues + propose designs

    点赞 评论 复制链接分享
  • weixin_39945679 weixin_39945679 5月前

    Codecov Report

    Merging #7421 into master will increase coverage by 0.02%. The diff coverage is n/a.

    diff
    @@            Coverage Diff             @@
    ##           master    #7421      +/-   ##
    ==========================================
    + Coverage   60.47%   60.49%   +0.02%     
    ==========================================
      Files         419      419              
      Lines       30618    30649      +31     
    ==========================================
    + Hits        18516    18542      +26     
    - Misses       9072     9077       +5     
      Partials     3030     3030              
    
    点赞 评论 复制链接分享

相关推荐