dongquqiao2010 2014-07-12 00:18
浏览 88

Go Web应用程序+ MySql驱动程序停止/超时

UPDATE: After further diagnosis, there is a problem with the go-sql-driver/mysql I driver package. It turns out that the underlying tcp can't seem to detect a broken tcp connection. The full details are in the github issue in the go-sql-driver/mysql project below:

https://github.com/go-sql-driver/mysql/issues/257

--

I am currently experiencing a stalling or broken web app after a period of idle between 15 to 48 minutes. The most critical issue is described below:

  • Visit a URL, any url on the site, and load the page completely (as in, the page actually loads and the logs show a complete page has loaded).
  • Close browser, and wait.

A typical request is logged like this:

2014/07/13 15:29:54 INFO template rendering: index
2014/07/13 15:29:54 METRIC, URL: /, HANDLER TIME: 7.2339ms, CTX TIME: 5.0894ms, TOTAL TIME: 12.3258ms

After a long period of time (ranging from 15m to 48m), the system all of a sudden logs these lines below with no interaction - the web app has been idle this entire time:

[MySQL] 2014/07/13 16:00:09 packets.go:32: read tcp remote-mysql-server-address:3306: connection timed out
[MySQL] 2014/07/13 16:00:09 packets.go:118: write tcp remote-mysql-server-address:3306: broken pipe
2014/07/13 16:00:10 INFO template rendering: index
2014/07/13 16:00:10 METRIC, URL: /, HANDLER TIME: 8.8574ms, CTX TIME: 31m19.2606723s, TOTAL TIME: 31m19.2695329s

Notice the "TOTAL TIME" is 31 minutes and 19 seconds? Also, notice the MySql driver error that is logged at the same time?

There was no activity / no web request made. The web app was simply idle.

The most critical issue is what comes next after these log messages: the very next web request is stalls completely, never returning a response:

user@govm1:~$ wget http://localhost
--2014-07-13 17:11:18--  http://localhost/
Resolving localhost (localhost)... 127.0.0.1
Connecting to localhost (localhost)|127.0.0.1|:80... connected.
HTTP request sent, awaiting response... Read error (Connection timed out) in headers.
Retrying.

--2014-07-13 17:26:19--  (try: 2)  http://localhost/
Connecting to localhost (localhost)|127.0.0.1|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: `index.html.4'

    [ <=>                                                                                  ] 6,310       --.-K/s   in 0.001s

2014-07-13 17:26:20 (9.61 MB/s) - `index.html.4' saved [6310]

And it sits idle, no response, for 15 minutes until wget times out.

Now, if I make a 2nd or 3rd request immediately after this one is stalled and anytime while it is stalled, the go web app responds and returns a full page for other requests. No issues. And then, the cycle starts over from the last request I make and let it site idle.

After this 15m, you can guess exactly what is logged next:

[MySQL] 2014/07/13 17:26:57 packets.go:32: read tcp remote-mysql-server-address:3306: connection timed out
[MySQL] 2014/07/13 17:26:57 packets.go:118: write tcp remote-mysql-server-address:3306: broken pipe
2014/07/13 17:26:57 INFO template rendering: index
2014/07/13 17:26:57 METRIC, URL: /, HANDLER TIME: 6.8938ms, CTX TIME: 15m39.1718434s, TOTAL TIME: 15m39.1787398s

Another 15m wait time.

I eliminated Windows Azure, the Cluster VIP and Firewall/Linux VM running the go web app as an issue because I ran wget http://localhost locally on the same box, and I get this "stalled" request that never completes and never sends back anything.

--

There are a number of factors in my web app so I will try to outline them accordingly.

Using:

  • Go 1.3
  • go-sql-driver/mysql ## Version 1.2 (2014-06-03)
  • Ubuntu 12.04 LTS, ~June 2014 Updates
  • Windows Azure

Do note that the Linux box running MySql is a different Linux box running the cluster of GoLang apps - and they are in separate dedicated Cloud Services. The MySql vm is a single VM, no cluserting.

Here is some related code:

// global handler for our DB
var db *sql.DB

// CLI parameter
var dbdsn string

func init() {

    flag.StringVar(&dbdsn, "dbdsn", "root:root@tcp(localhost:3306)/prod?timeout=5s&tls=false&autocommit=true", "Specifies the MySql DSN connection.")
    flag.Parse()

    var err error
    db, err = sql.Open("mysql", dbdsn)
    if err != nil {
        log.Printf("ERROR in sql.Open(): %v", err)
    }

    //db.SetMaxIdleConns(5)

    // verify the DSN is setup properly1
    err = db.Ping()
    if err != nil {
        panic("PANIC when pinging db: " + err.Error()) // proper error handling instead of panic in your app
    }
}

// **********
// * omitted is the Gorilla MUX router and http handler registrations
// **********

func ArticleHandler(w http.ResponseWriter, r *http.Request, c *Context) (err error) {

    m := NewArticle(c)
    id := c.Vars["id"]

    var pid int
    var title, body, excerpt, date, slug, fi, fv, region, region_slug string
    err = db.QueryRow(
        "SELECT p.ID, p.post_title, p.post_content, p.post_excerpt, p.post_date, p.post_name, "+
            "(SELECT fpim.meta_value FROM wp_postmeta fpim WHERE fpim.meta_key = '_wp_attached_file' AND fpim.post_id = (SELECT fpim2.meta_value FROM wp_postmeta fpim2 WHERE fpim2.post_id = p.ID AND fpim2.meta_key = '_thumbnail_id' LIMIT 1) LIMIT 1) AS featured_image, "+
            "(SELECT fpim3.meta_value FROM wp_postmeta fpim3 WHERE fpim3.meta_key = 'fv_video' AND fpim3.post_id = p.ID LIMIT 1) AS featured_video, "+
            "t.name as region, t.slug as region_slug "+
            "FROM wp_posts p "+
            "JOIN wp_term_relationships tr ON tr.object_id=p.ID "+
            "JOIN wp_term_taxonomy tt ON tt.term_taxonomy_id=tr.term_taxonomy_id "+
            "JOIN wp_terms t ON t.term_id=tt.term_id "+
            "WHERE p.post_name=? AND p.post_type='post' AND p.post_status='publish' AND p.post_date <= UTC_TIMESTAMP()"+
            "AND tr.object_id=p.ID AND tt.parent = (SELECT t3.term_id FROM wp_terms t3 WHERE t3.name=? LIMIT 1) LIMIT 1",
        id, RegionsParentCategory).
        Scan(&pid, &title, &body, &excerpt, &date, &slug, &fi, &fv, &region, &region_slug)

    if err != nil {
        if err == sql.ErrNoRows {

            // snipped code for redirects

            // article was not found
            return handleNotFound(w, r, c)

        } else {
            log.Printf("ERROR in .Scan(): %v", err)
        }
    } else {
        m.Region = Region{
            Name: region,
            Slug: region_slug,
        }
        m.Id = pid
        m.Title = title
        m.Body = template.HTML(body) // render the raw html
        m.Excerpt = excerpt
        m.Datetime = date
        m.Slug = slug
        m.FeaturedImageUrl = fi
        m.FeaturedVideoUrl = fv
    }

    web.RenderTemplate(w, "article", m)
    return
}

5 more DB queries, per request

In addition to this query, my "Context" you see being passed into the handler runs 4 to 6 additional SQL queries. Therefore, each "article" handler that loads runs about 5 to 7 SQL queries, minimal, using the exact same pattern and *db global variable you see above.

Timeouts / errors are always on the same DB query

Here's one of the "context" queries as a comparison:

rows2, err := db.Query(
    "SELECT p.post_title, p.post_name "+
        "FROM wp_posts p "+
        "WHERE p.post_type='page' AND p.post_status='publish' AND p.post_date <= UTC_TIMESTAMP() "+
        "AND p.post_parent = (SELECT p2.ID FROM wp_posts p2 WHERE p2.post_name=? LIMIT 1) "+
        "ORDER BY p.menu_order",
    FooterPagesParentNameSlug)
if err != nil {
    log.Printf("ERROR in AllPages .Query() : %v", err)
} else {
    defer rows2.Close()
    c.AllFooterPages = make([]FooterPage, 0)
    for rows2.Next() {
        var name, slug string
        err := rows2.Scan(&name, &slug)
        if err != nil {
            log.Printf("ERROR in AllPages row.Scan() : %v", err)
        } else {
            p := FooterPage{
                Page: Page{
                    Title: name,
                    Slug:  slug,
                },
            }
            c.AllFooterPages = append(c.AllFooterPages, p)
        }
    }
}

Nothing special there.

I do call defer rows2.Close() only if there was no error. Perhaps that is part of the issue? This particular SQL query seems to log errors under load tests as no response or mysql driver timing out.

Questions

Why am I getting request timeouts logged in excess of 15 to 30 minutes, from an idle site? That seems like a bug with the mysql driver I am using, possibly holding a connection open. But, the last http request was successful and returned a complete page + template.

I even have the Timeout set in the connection string, which is 5 seconds. Even if it is a problem with the mysql server, why the 15 minute timeout/request logged? Where did that request come from?

It still could be a MySql driver issue, blocking the request from completing - maybe being blocked by the MySql dedicated VM and an issue there. If that is the case, then how come nothing is logged? What is this random timeout of 15m to 49m minutes? It is usually only 15m or 31m, but sometimes 48m is logged.

It is very interesting on the "15m" multiples there in the timeouts (@15m, 31m and 48m), allotting for some padding there in seconds.

Thanks in advance.

  • 写回答

2条回答 默认 最新

  • dr5779 2014-07-12 03:24
    关注

    Should I be "opening" and defer db.Close() on every single web request/handler?

    No. Either create a global and don't worry about closing it (as you have now) OR pass the pool (*sql.DB) via an app context i.e. as per https://medium.com/@benbjohnson/structuring-applications-in-go-3b04be4ff091 by having your handlers as methods on a context type that embeds *sql.DB and anything else you might need.

    You may also want to look at using http://jmoiron.github.io/sqlx/ to help marshal your database results into structs/maps without having to do the dance yourself.


    As an aside (that will help debug the issue, I'd bet) I'd fix your second code example to match the below - because when you encounter an error you're only logging it, but your code continues as you don' return:

    rows2, err := db.Query(`
        SELECT p.post_title, p.post_name
        FROM wp_posts p 
        WHERE p.post_type='page' AND p.post_status='publish' AND p.post_date <= UTC_TIMESTAMP()
        AND p.post_parent = (SELECT p2.ID FROM wp_posts p2 WHERE p2.post_name=? LIMIT 1)
        ORDER BY p.menu_order`, FooterPagesParentNameSlug)
    if err != nil {
        log.Printf("ERROR in AllPages .Query() : %v", err)
        return err
    }
    defer rows2.Close()
    
    c.AllFooterPages = make([]FooterPage, 0)
    
    for rows2.Next() {
        var name, slug string
        err := rows2.Scan(&name, &slug)
        if err != nil {
            log.Printf("ERROR in AllPages row.Scan() : %v", err)
            return err // Same here!
        }
    // Rest of your code
    }
    
    评论

报告相同问题?

悬赏问题

  • ¥15 如何用stata画出文献中常见的安慰剂检验图
  • ¥15 c语言链表结构体数据插入
  • ¥40 使用MATLAB解答线性代数问题
  • ¥15 COCOS的问题COCOS的问题
  • ¥15 FPGA-SRIO初始化失败
  • ¥15 MapReduce实现倒排索引失败
  • ¥15 ZABBIX6.0L连接数据库报错,如何解决?(操作系统-centos)
  • ¥15 找一位技术过硬的游戏pj程序员
  • ¥15 matlab生成电测深三层曲线模型代码
  • ¥50 随机森林与房贷信用风险模型