weixin_39592789
weixin_39592789
2020-11-29 10:02

Flaky test: test_token_registered_race is flaky

https://circleci.com/gh/raiden-network/raiden/104723#artifacts/containers/12 https://circleci.com/gh/raiden-network/raiden/105340#tests/containers/12

The test registers the same token twice with the same registry, one of the transactions has to raise a recoverable error because it lost the race.

Some times the test fails because an unrecoverable error is raised, which is incorrect. It seems that both transactions have successfully done the preconditions checks, and the second fails on the gas estimation, this should be detected by either:

https://github.com/raiden-network/raiden/blob/e39b6ef781480d82cd4ba772eeebb1785ba69639/raiden/network/proxies/token_network_registry.py#L375-L380

or:

https://github.com/raiden-network/raiden/blob/e39b6ef781480d82cd4ba772eeebb1785ba69639/raiden/network/proxies/token_network_registry.py#L421-L422

However, both checks failed and this exception is raised:

https://github.com/raiden-network/raiden/blob/e39b6ef781480d82cd4ba772eeebb1785ba69639/raiden/network/proxies/token_network_registry.py#L424-L426

From geth's logs we see the gas estimation failure:

WARN [01-21|20:06:53.018] Served eth_estimateGas conn=127.0.0.1:46156 reqid=354 t=3.268166ms err="gas required exceeds allowance (9393937) or always failing transaction"

I don't fully understand why the checks above didn't see the existing token network, but perhpa sit is a race condition on the failing block number (If that is the case, this is related to issue https://github.com/raiden-network/raiden/issues/3890):

https://github.com/raiden-network/raiden/blob/e39b6ef781480d82cd4ba772eeebb1785ba69639/raiden/network/proxies/token_network_registry.py#L329-L332

该提问来源于开源项目:raiden-network/raiden

  • 点赞
  • 写回答
  • 关注问题
  • 收藏
  • 复制链接分享
  • 邀请回答

6条回答

  • weixin_39882948 weixin_39882948 5月前

    Also seen in https://app.circleci.com/jobs/github/raiden-network/raiden/105530

    点赞 评论 复制链接分享
  • weixin_39592789 weixin_39592789 5月前

    https://circleci.com/gh/raiden-network/raiden/106681#artifacts/containers/12

    点赞 评论 复制链接分享
  • weixin_39592789 weixin_39592789 5月前

    78561286.log raiden-debug_2020-01-30T10:24:18.990857.log

    This is the rough outline:

    • add_token is called concurrently with the same Ethereum account
    • both threads are scheduled in such a way that the triggering block is the same (0xcb8aed3f)
    • the first add_token (dd57915a) successfully does the gas estimation (gasLimit=4392813), and send the transaction 0x2fde401a. (by 10:25:19.974546 the eth_sendTransactionRaw had returned).
    • The transaction is available for Geth INFO [01-30|10:25:19.967] Submitted transaction fullhash=0x2fde401a5a1e14d1e069479e5d3f61200ff30bc2b0c7081e056d91ff3a3eef20 recipient=0x713565536Aa5495028EdA6EeF7aF9C65461765c3
    • Then the second add_token (0d6040e3) tries to do the gas estimation (at 10:25:20.010097, at this point gas estimation should be blocking on the RPC call)
    • The transaction is added to this block: INFO [01-30|10:25:20.002] Commit new mining work number=62 sealhash=f7221e…be0dc0 uncles=0 txs=1 gas=3349424 fees=0.003349424 elapsed=1.544ms
    • The gas estimation happens just after WARN [01-30|10:25:20.035] Served eth_estimateGas conn=127.0.0.1:33466 reqid=354 t=2.815359ms err="gas required exceeds allowance (9412308) or always failing transaction"
    • The Raiden node gets the bad gas estimation and start the post conditions checks, by 2020-01-30 10:25:20.254962 the exception is raised.

    So it looks the post conditions are indeed using the wrong block.

    点赞 评论 复制链接分享
  • weixin_39653448 weixin_39653448 5月前
    • for now skip it
    点赞 评论 复制链接分享
  • weixin_39605414 weixin_39605414 5月前

    Since we can't reliably get the block at which the gas estimation was executed, there is always the possibility of the post conditions to not detect the real reason.

    Some things we could do: * Reduce the likelihood of getting the wrong block by using the block number that is fetched immediately after the estimateGas. Currently, we fetch the block number a second time in the post conditions check * Rely on recent geth improvements that return the revert message on estimateGas. Problems: only works for geth, requires revert messages, probably not supported by web3py yet. * treat the "createERC20TokenNetwork failed for an unknown reason" error differently

    I'm not satisfied with any of these approaches. Suggestions?

    点赞 评论 复制链接分享
  • weixin_39605414 weixin_39605414 5月前

    For the test itself, we can mark it as flaky, so the test runner can rerun it. But that does not solve the underlying problem.

    点赞 评论 复制链接分享

相关推荐