douju9272 2018-10-09 17:21
浏览 210

MongoDB连接在多个应用程序服务器上失败

We have mongodb with mgo driver for golang. There are two app servers connecting to mongodb running besides apps (golang binaries). Mongodb runs as a replica set and each server connects two primary or secondary depending on replica's current state.

We have experienced the SocketException handling request, closing client connection: 9001 socket exception on one of the mongo servers( which resulted in the connection to mongodb from our apps to die. After that, replica set continued to be functional but our second server (on which the error didn't happen) the connection died as well.

In the golang logs it was manifested as:

read tcp 10.10.0.5:37698-\u003e10.10.0.7:27017: i/o timeout

Why did this happen? How can this be prevented?

As I understand, mgo connects to the whole replica by the url (it detects whole topology by the single instance's url) but why did dy·ing of the connection on one of the servers killed it on second one?

Edit:

  1. Full package path that is used "gopkg.in/mgo.v2"
  2. Unfortunately can't share mongo files here. But besides the socketexecption mongo logs don't contain anything useful. There is indication of some degree of lock contention where lock acquired time is quite high some times but nothing beyond that
  3. MongoDB does some heavy indexing some times but the wasn't any unusual spikes recently so it's nothing beyond normal
  • 写回答

1条回答 默认 最新

  • dongna9185 2018-10-15 21:10
    关注

    First, the mgo driver you are using: gopkg.in/mgo.v2 developed by Gustavo Niemeyer (hosted at https://github.com/go-mgo/mgo) is not maintained anymore.

    Instead use the community supported fork github.com/globalsign/mgo. This one continues to get patched and evolve.

    Its changelog includes: "Improved connection handling" which seems to be directly relating to your issue.

    Its details can be read here https://github.com/globalsign/mgo/pull/5 which points to the original pull request https://github.com/go-mgo/mgo/pull/437:

    If mongoServer fail to dial server, it will close all sockets that are alive, whether they're currently use or not. There are two cons:

    • Inflight requests will be interrupt rudely.

    • All sockets closed at the same time, and likely to dial server at the same time. Any occasional fail in the massive dial requests (high concurrency scenario) will make all sockets closed again, and repeat...(It happened in our production environment)

    So I think sockets currently in use should closed after idle.

    Note that the github.com/globalsign/mgo has backward compatible API, it basically just added a few new things / features (besides the fixes and patches), which means you should be able to just change the import paths and all should be working without further changes.

    评论

报告相同问题?

悬赏问题

  • ¥20 软件测试决策法疑问求解答
  • ¥15 win11 23H2删除推荐的项目,支持注册表等
  • ¥15 matlab 用yalmip搭建模型,cplex求解,线性化处理的方法
  • ¥15 qt6.6.3 基于百度云的语音识别 不会改
  • ¥15 关于#目标检测#的问题:大概就是类似后台自动检测某下架商品的库存,在他监测到该商品上架并且可以购买的瞬间点击立即购买下单
  • ¥15 神经网络怎么把隐含层变量融合到损失函数中?
  • ¥15 lingo18勾选global solver求解使用的算法
  • ¥15 全部备份安卓app数据包括密码,可以复制到另一手机上运行
  • ¥20 测距传感器数据手册i2c
  • ¥15 RPA正常跑,cmd输入cookies跑不出来