doudouba4520 2019-05-22 03:04
浏览 638
已采纳

BigQuery-使用goLang获取1000000条记录并对数据进行一些处理

I Have 1000000 records inside BigQuery. what is the best way to fetch data from DB and process using goLang? I'm getting timeout issue if fetch all the data without limit. already I increase the limit to 5min, but it takes more than 5 min. I want to do some streaming call or pagination implementation, But i don't know in golang how I do.

var FetchCustomerRecords = func(req *http.Request) *bigquery.RowIterator {
    ctx := appengine.NewContext(req)
    ctxWithDeadline, _ := context.WithTimeout(ctx, 5*time.Minute)
    log.Infof(ctx, "Fetch Customer records from BigQuery")
    client, err := bigquery.NewClient(ctxWithDeadline, "ddddd-crm")
    q := client.Query(
        "SELECT * FROM Something")

    q.Location = "US"
    job, err := q.Run(ctx)
    if err != nil {
        log.Infof(ctx, "%v", err)
    }
    status, err := job.Wait(ctx)
    if err != nil {
        log.Infof(ctx, "%v", err)

    }
    if err := status.Err(); err != nil {
        log.Infof(ctx, "%v", err)
    }
    it, err := job.Read(ctx)

    if err != nil {
        log.Infof(ctx, "%v", err)
    }
    return it
}
  • 写回答

2条回答 默认 最新

  • dtz33344 2019-05-22 08:27
    关注

    You can read the table contents directly without issuing a query. This doesn't incur query charges, and provides the same row iterator as you would get from a query.

    For small results, this is fine. For large tables, I would suggest checking out the new storage api, and the code sample on the samples page.

    For a small table or simply reading a small subset of rows, you can do something like this (reads up to 10k rows from one of the public dataset tables):

    func TestTableRead(t *testing.T) {
        ctx := context.Background()
        client, err := bigquery.NewClient(ctx, "my-project-id")
        if err != nil {
            t.Fatal(err)
        }
    
        table := client.DatasetInProject("bigquery-public-data", "stackoverflow").Table("badges")
        it := table.Read(ctx)
    
        rowLimit := 10000
        var rowsRead int
        for {
            var row []bigquery.Value
            err := it.Next(&row)
            if err == iterator.Done || rowsRead >= rowLimit {
                break
            }
    
            if err != nil {
                t.Fatalf("error reading row offset %d: %v", rowsRead, err)
            }
            rowsRead++
            fmt.Println(row)
        }
    }
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?