duanjian4331
duanjian4331
2017-07-13 13:42
浏览 140
已采纳

gocb:使用golang将其批量插入床型数据库-不会插入整个数据

I am creating JSON Data (approx. 5000 records) in my SQL server instance and trying to Insert it into couchbase bucket using bulk insert operation in golang. The problem here is that entire data is not being pushed and a random number of records (between 2000 to 3000) are being insert only.

The code is:

package main

import (
    "database/sql"
    "log"
    "fmt"
    _ "github.com/denisenkom/go-mssqldb"
    "gopkg.in/couchbase/gocb.v1"
)


func main() {
    var (
        ID string
        JSONData string
    )

    var items []gocb.BulkOp      
    cluster, _ := gocb.Connect("couchbase://localhost")
    bucket, _ := cluster.OpenBucket("example", "")

    condb, _ := sql.Open("mssql", "server=.\\SQLEXPRESS;port=62587; user id=<id>;password=<pwd>;")

    // Get approx 5000 Records From SQL Server in JSON format
    rows, err = condb.Query("Select id, JSONData From User")
    if err != nil {
        log.Fatal(err)
        err = nil
    }

    for rows.Next() {
        _ = rows.Scan(&ID,&JSONData)
        items = append(items, &gocb.UpsertOp{Key: ID, Value: JSONData})
    }

    //Bulk Load JSON into Couchbase
    err = bucket.Do(items)
    if err != nil {
        fmt.Println("ERRROR PERFORMING BULK INSERT:", err)
    }

    _ = bucket.Close() 
}

Please tell me where I went wrong here.

FYI the columns ID and JSONdata in sql query contain valid key and JSON strings. Also, any improvement advice in the the way its coded will be appreciated.

  • 点赞
  • 写回答
  • 关注问题
  • 收藏
  • 邀请回答

2条回答 默认 最新

  • dsg7513
    dsg7513 2017-07-18 11:57
    已采纳

    I missed checking the Err field of InsertOp type and when I did that, I came to know that the items array overflows when the data exceeds it's capacity and a message 'queue overflowed' shows on the screen when you print that field

    for i := range items {
        fmt.Println( items[i].(*gocb.InsertOp).Err)
    }
    

    Attatched screenshot of the error message is here: Err.png

    Is there any workaround for this limitation apart from splitting the data into a number of batches and performing multiple bulk inserts?

    点赞 评论
  • duanpo2037
    duanpo2037 2017-07-13 21:36

    Why not try using a number of goroutines and a channel to synchronize them. Create a channel of items that need to be inserted, and then start 16 or more goroutines which read form the channel, perform the insert and then continue. The most common obvious bottleneck for a strictly serial inserter is going to be the network round-trip, if you can have many goroutines performing inserts at once, you will vastly improve the performance.

    P.S. The issue with bulk insert not inserting every document is a strange one, I am going to take a look into this. As @ingenthr mentioned above though, is it possible that you are doing upsert's and have multiple operations for the same keys?

    Old Question, In the Answers section in error: Are you getting any error outputs from the bulk insert?

    点赞 评论

相关推荐