weixin_39725594 2020-11-29 13:32
浏览 0

For 2.6.1: modelcache worker, multiwatcher and shared state stopped error handling.

Description of change

While testing force destroy model functionality, I have noticed that 70% of my logs become quickly filled with:


134864d5-3510-42f9-81ee-a443e96c34ce: machine-0 2019-05-13 02:26:36 ERROR juju.worker.modelcache worker.go:171 watcher error, shared state watcher was stopped, getting new watcher
134864d5-3510-42f9-81ee-a443e96c34ce: machine-0 2019-05-13 02:26:36 ERROR juju.worker.modelcache worker.go:171 watcher error, shared state watcher was stopped, getting new watcher

Having a look at modelcache worker, it is aware and knowns what to do with state.ErrStopped but multiwatcher besides state.ErrStopped can also emit errors.Errorf("shared state watcher was stopped").

Upon close inspection, the error messages are very similar and they are emitted in correct circumstances but when modelcache worker receives the error, it can only react to one and goes completely beserk with the other - re-starts continuously, spams logs, etc.

This PR creates a new state error type which is used by multiwatcher to signify that a state watcher was stopped. This allows to still emit messages with exactly the same messages as before but in addition enables error type checking in the model cache worker. With the patch in place, logs do not get flooded and model cache worker behaves correctly.

(seems to be bug https://bugs.launchpad.net/juju/+bug/1785251)

该提问来源于开源项目:juju/juju

  • 写回答

7条回答 默认 最新

  • weixin_39725594 2020-11-29 13:32
    关注
    评论

报告相同问题?