doushou1298 2016-11-09 16:17
浏览 88
已采纳

我认为Go服务器正在杀死Mongo服务器

So long story short, I have many servers across 3 AWS regions. In an effort to reduce servers, which are written in Node, I've rewritten them in Go. To my delight and surprise 1 Go server can handle what 10 Node servers handle with lower CPU utilization and less ELB latency. This is AWESOME lol. However, I think maybe that they are too fast lol. We have a single (yes I know very bad) MongoDB server used to log data. Every request that comes in is logged in a particular way and sent to this server. Since adding just 2 Go servers that MongoDB server has started to crash.

TRACEBACK

2016-11-09T03:06:41.240-0500 I JOURNAL  [journal writer] warning couldn't write to / rename file /data/data/db/journal/prealloc.2: Couldn't open directory '/data/data/db/journal' for flushing: errno:24 Too many open files
2016-11-09T03:06:41.240-0500 I JOURNAL  [journal writer] warning exception opening journal file couldn't open file /data/data/db/journal/j._213 for writing errno:9 Bad file descriptor
2016-11-09T03:06:41.240-0500 I JOURNAL  [journal writer] error exception in dur::journal couldn't open file /data/data/db/journal/j._213 for writing errno:9 Bad file descriptor
2016-11-09T03:06:41.242-0500 F JOURNAL  [journal writer] dbexception in journalWriterThread causing immediate shutdown: 13516 couldn't open file /data/data/db/journal/j._213 for writing errno:9 Bad file descriptor
2016-11-09T03:06:41.242-0500 I -        [journal writer] Invariant failure false src/mongo/db/storage/mmap_v1/dur_journal_writer.cpp 258
2016-11-09T03:06:41.246-0500 I JOURNAL  [durability] warning couldn't write to / rename file /data/data/db/journal/prealloc.2: couldn't open file /data/data/db/journal/prealloc.2 for writing errno:9 Bad file descriptor
2016-11-09T03:06:41.246-0500 F JOURNAL  [durability] dbexception in durThread causing immediate shutdown: 13516 couldn't open file /data/data/db/journal/j._213 for writing errno:9 Bad file descriptor
2016-11-09T03:06:41.246-0500 I -        [durability] Invariant failure false src/mongo/db/storage/mmap_v1/dur.cpp 862
2016-11-09T03:06:41.246-0500 I CONTROL  [journal writer]
 0xf51949 0xef1671 0xed6192 0xd17613 0xf9f9b4 0x7fd09697d184 0x7fd09544337d
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"400000","o":"B51949"},{"b":"400000","o":"AF1671"},{"b":"400000","o":"AD6192"},{"b":"400000","o":"917613"},{"b":"400000","o":"B9F9B4"},{"b":"7FD096975000","o":"8184"},{"b":"7FD095349000","o":"FA37D"}],"processInfo":{ "mongodbVersion" : "3.0.3", "gitVersion" : "b40106b36eecd1b4407eb1ad1af6bc60593c6105", "uname" : { "sysname" : "Linux", "release" : "3.13.0-54-generic", "version" : "#91-Ubuntu SMP Tue May 26 19:15:08 UTC 2015", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000", "buildId" : "F56F80CB96B4DBFC070BEB0ADAC7D6B274BFC6B1" }, { "b" : "7FFF14FD0000", "elfType" : 3, "buildId" : "1C0D0A18FF043EED9EE11DB5E5E90A3F74729341" }, { "b" : "7FD096975000", "path" : "/lib/x86_64-linux-gnu/libpthread.so.0", "elfType" : 3, "buildId" : "31E9F21AE8C10396171F1E13DA15780986FA696C" }, { "b" : "7FD096716000", "path" : "/lib/x86_64-linux-gnu/libssl.so.1.0.0", "elfType" : 3, "buildId" : "74864DB9D5F69D39A67E4755012FB6573C469B3D" }, { "b" : "7FD09633A000", "path" : "/lib/x86_64-linux-gnu/libcrypto.so.1.0.0", "elfType" : 3, "buildId" : "AAE7CFF8351B730830BDBCE0DCABBE06574B7144" }, { "b" : "7FD096132000", "path" : "/lib/x86_64-linux-gnu/librt.so.1", "elfType" : 3, "buildId" : "E2A6DD5048A0A051FD61043BDB69D8CC68192AB7" }, { "b" : "7FD095F2E000", "path" : "/lib/x86_64-linux-gnu/libdl.so.2", "elfType" : 3, "buildId" : "DA9B8C234D0FE9FD8CAAC8970A7EC1B6C8F6623F" }, { "b" : "7FD095C2A000", "path" : "/usr/lib/x86_64-linux-gnu/libstdc++.so.6", "elfType" : 3, "buildId" : "76190E922AF7457D078F75C9B15FA184E83EB506" }, { "b" : "7FD095924000", "path" : "/lib/x86_64-linux-gnu/libm.so.6", "elfType" : 3, "buildId" : "D144258E614900B255A31F3FD2283A878670D5BC" }, { "b" : "7FD09570E000", "path" : "/lib/x86_64-linux-gnu/libgcc_s.so.1", "elfType" : 3, "buildId" : "36311B4457710AE5578C4BF00791DED7359DBB92" }, { "b" : "7FD095349000", "path" : "/lib/x86_64-linux-gnu/libc.so.6", "elfType" : 3, "buildId" : "CF699A15CAAE64F50311FC4655B86DC39A479789" }, { "b" : "7FD096B93000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "D0F537904076D73F29E4A37341F8A449E2EF6CD0" } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x29) [0xf51949]
 mongod(_ZN5mongo10logContextEPKc+0xE1) [0xef1671]
 mongod(_ZN5mongo15invariantFailedEPKcS1_j+0xB2) [0xed6192]
 mongod(_ZN5mongo3dur13JournalWriter20_journalWriterThreadEv+0x953) [0xd17613]
 mongod(+0xB9F9B4) [0xf9f9b4]
 libpthread.so.0(+0x8184) [0x7fd09697d184]
 libc.so.6(clone+0x6D) [0x7fd09544337d]

In an efforts to fix this I increased the ulimit on that machine. However I think something can also be done from the programmic side.

Here is my mongo package

package mongo

import (
    "time"

    "gopkg.in/mgo.v2"
)

var (
    // TagDB ...
    TagDB DataStore
    // LogDB ...
    LogDB DataStore
)

func init() {
    TagDB.ConnectToTagserver()
    LogDB.ConnectToLogServer()
}

// DataStore containing a pointer to a mgo session
type DataStore struct {
    Session *mgo.Session
}

// ConnectToTagserver is a helper method that connections to pubgears' tagserver
// database
func (ds *DataStore) ConnectToTagserver() {
    mongoDBDialInfo := &mgo.DialInfo{
        Addrs:    []string{"some_IP"},
        Timeout:  60 * time.Second,
        Database: "tagserver",
    }
    sess, err := mgo.DialWithInfo(mongoDBDialInfo)
    if err != nil {
        panic(err)
    }
    sess.SetMode(mgo.Monotonic, true)
    TagDB.Session = sess
}


// database SERVER IN QUESTION
func (ds *DataStore) ConnectToLogServer() {
    mongoDBDialInfo := &mgo.DialInfo{
        Addrs:    []string{"some_IP"}, 
        Timeout:  60 * time.Second,
        Database: "nginx_logs",
    }
    sess, err := mgo.DialWithInfo(mongoDBDialInfo)
    if err != nil {
        println(1)
        panic(err)
    }
    sess.SetMode(mgo.Monotonic, true)
    LogDB.Session = sess
}

// Close is a helper method that ensures the session is properly terminated
func (ds *DataStore) Close() {
    ds.Session.Close()
}

Here is the file that used that package

package models

import (
    "errors"
    "fmt"
    "strconv"
    "time"

    "gopkg.in/mgo.v2/bson"

    "./mongo"
)

// RawRequests ...
type RawRequests struct {
    TagServer   string            `json:"tag_server"`
    Server      string            `json:"server"`
    Slug        string            `json:"slug"`
    Zone        string            `json:"zone"`
    Size        string            `json:"size"`
    Network     string            `json:"network"`
    TagHash     string            `json:"tag_hash"`
    Extra       string            `json:"extra"`
    Logged      time.Time         `json:"logged"`
    Date        string            `json:"date"`
    Hour        int               `json:"hour"`
    QueryParams map[string]string `json:"query_params"`
}

// RawTagRequests ...
type RawTagRequests struct {
    TagServer   string            `json:"tag_server"`
    Server      string            `json:"server"`
    Slug        string            `json:"slug"`
    Zone        string            `json:"zone"`
    Size        string            `json:"size"`
    Network     string            `json:"network"`
    TagHash     string            `json:"tag_hash"`
    Extra       string            `json:"extra"`
    Logged      time.Time         `json:"logged"`
    Date        string            `json:"date"`
    Hour        int               `json:"hour"`
    QueryParams map[string]string `json:"query_params"`
    ChainNext   string            `json:"chain_next"`
}

// LogRequest ...
func LogRequest(tagServer string, server string, slug string, zone string, size string, network string, extra string, tagHash string, queryParams map[string]string) error {
    dbsession := mongo.LogDB.Session.Copy()
    defer dbsession.Close()
    logCollection := dbsession.DB("nginx_logs").C("raw_requests")
    d := time.Now()
    hour, _, _ := d.Clock()
    year, month, day, _ := formatDate(d.Year(), int(d.Month()), d.Day())
    date := fmt.Sprintf("%s-%s-%s", year, month, day)
    return logCollection.Insert(bson.M{"tag_server": tagServer, "server": server, "slug": slug, "zone": zone, "size": size, "network": network, "tag_hash": tagHash, "extra": extra, "logged": time.Now(), "date": date, "hour": hour, "query_params": queryParams})
}

// LogTagRequests ...
func LogTagRequests(tagServer string, server string, slug string, zone string, size string, network string, extra string, tagHash string, queryParams map[string]string, chainHash string) error {
    dbsession := mongo.LogDB.Session.Copy()
    defer dbsession.Close()
    logCollection := dbsession.DB("nginx_logs").C("raw_tag_requests")
    d := time.Now()
    hour, _, _ := d.Clock()
    year, month, day, _ := formatDate(d.Year(), int(d.Month()), d.Day())
    date := fmt.Sprintf("%s-%s-%s", year, month, day)
    return logCollection.Insert(bson.M{"tag_server": tagServer, "server": server, "slug": slug, "zone": zone, "size": size, "network": network, "tag_hash": tagHash, "extra": extra, "logged": time.Now(), "date": date, "hour": hour, "query_params": queryParams, "chain_hash": chainHash})
}

func formatDate(year int, month int, day int) (y string, m string, d string, err error) {
    var tmonth, tday, tyear string
    tyear = strconv.Itoa(year)
    // convert the month and year and replace with the correct number of digits
    if month < 10 {
        tmonth = "0" + strconv.Itoa(int(month))
    } else {
        tmonth = strconv.Itoa(int(month))
    }
    if day < 10 {
        tday = "0" + strconv.Itoa(int(day))
    } else {
        tday = strconv.Itoa(int(day))
    }
    if len(tmonth) == 2 && len(tday) == 2 && len(tyear) == 4 {
        return tyear, tmonth, tday, nil
    }
    return tyear, tmonth, tday, errors.New("One of the values passed in does not meet the time format requirement month-day-year --> 1991-01-15")
}

Lastly within my route, I simply call my method go LogRequest(tagServer, host, site, channel, adSize, network, tagExtra, tagHash, queryParams)

  • 写回答

1条回答 默认 最新

  • duandun2218 2016-11-10 07:07
    关注
    1. You should use copy only once per request-response cycle. Using it in each call to models will result in consuming a lot of sockets. Each model call is opening a socket and closing it(using defer) while exiting the routine. Try to reuse a copied session across multiple calls to your db in the same request-response cycle.
    2. From the logs and the code snippet, it looks like that you should take care of 1 until unless in each request-response cycle you are calling your model routines to perform CRUD only once.
    3. Copy uses a new socket to open connection preserving the auth credentials from the parent session while clone uses the same socket as the parent session(while using the cloned session one of the parent/child will have to wait on the socket until the other one is done with it). Hence, to process multiple requests to your server in parallel its better to use copy per request handler.(https://godoc.org/gopkg.in/mgo.v2#Session.Clone)
    4. After increasing the ulimit on a machine do make sure it gets reflected immediately - http://lzone.de/blog/Apply-changes-to-limits.conf-immediately.
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 TI的insta-spin例程
  • ¥15 完成下列问题完成下列问题
  • ¥15 C#算法问题, 不知道怎么处理这个数据的转换
  • ¥15 YoloV5 第三方库的版本对照问题
  • ¥15 请完成下列相关问题!
  • ¥15 drone 推送镜像时候 purge: true 推送完毕后没有删除对应的镜像,手动拷贝到服务器执行结果正确在样才能让指令自动执行成功删除对应镜像,如何解决?
  • ¥15 求daily translation(DT)偏差订正方法的代码
  • ¥15 js调用html页面需要隐藏某个按钮
  • ¥15 ads仿真结果在圆图上是怎么读数的
  • ¥20 Cotex M3的调试和程序执行方式是什么样的?