weixin_39559015
weixin_39559015
2020-12-26 15:55

[full ci] Improve exec error message for a CVM shutdown mid operation

This adds some error types and propagates them back from the portlayer to the personality. TaskInspect now returns a ConflictError if the supplied ID does not exist and we see that the state of the cvm is powered off. This error case may be too broad, but in TaskInspect we are not sure if the ID supplied was meant to be an exec ID or a session task ID. This means the solution is best guess, if the id is not found and we are off it is assumed that the operation has been interupted. I thought about just doing the state check again on the personality side of things, however the unknown task ID issue is already triggered after making it past the original state check in CreateExecTask, so I deemed that the task inspect check might need to be closer to the SoT(feel free to correct me on this as well).

Additionally, the guest reload config function has been wrapped in a retry that retries on transient errors. We may want to narrow down the potential retry case as I used the waiter.go intermittent error decider function in first pass since it was designed to retry tasks against vsphere that are supposed to be inherently transient. Based on discussion in this PR we may change that.

I have also added some logging, with some trace.Operation this is pending potential removal per 's review. I added it originally since it made it easier to track what all was getting called when I was learning the Exec path of operations. It was also needed since I needed to track concurrent calls against the same container.

note: please keep in mind this was my first exposure to the Exec portion of the code base, if I have done something silly please let me know. I spent a decent while poking around and trying understand end to end exactly what was going on.

Fixes #6370

该提问来源于开源项目:vmware/vic

  • 点赞
  • 写回答
  • 关注问题
  • 收藏
  • 复制链接分享
  • 邀请回答

5条回答

  • weixin_39559015 weixin_39559015 4月前

    TIL that the docker client calls multiple personality endpoints. So there will still be a concurrent modification error when we report the shutdown on the race condition edge. so now the error message looks like this

    
    Conflict error from portlayer: [PUT /containers/{handle}][409] commitConflict  &{Code:0 Message:Cannot complete operation due to concurrent modification by another operation.}^M
    Error response from daemon: Conflict error from portlayer: Cannot complete the operation, container 1f23ba97c6741466d04c2d3e95b3636711f9cf19c12e32737fa3cd7fd47ab411 has been powered off during execution
    
    点赞 评论 复制链接分享
  • weixin_39830233 weixin_39830233 4月前

    From my perspective I wouldn't add pkg/trace/Operation functionality in this PR. #2739 will result in changes to the package and I'll be adding operations in "strategic" places....so to minimize rework / wasted effort I'd vote for you to use fmt for now...

    点赞 评论 复制链接分享
  • weixin_39559015 weixin_39559015 4月前

    You got it

    点赞 评论 复制链接分享
  • weixin_39830233 weixin_39830233 4月前

    You have added the creation of an Operation at the beginning of most (if not all) the exec funcs -- please use the op.Debugf(log message) consistently. Specifically thinking about the type switches as part of the error handling -- use op.Debugf and not log.Debugf

    点赞 评论 复制链接分享
  • weixin_39830233 weixin_39830233 4月前

    LGTM

    Approved with PullApprove Approved with PullApprove

    点赞 评论 复制链接分享

相关推荐