doucheng3407 2016-08-11 21:34
浏览 27
已采纳

需要更快的方法来列出项目中的所有数据集/表

I am creating a utility that needs to be aware of all the datasets/tables that exist in my BigQuery project. My current code for getting this information is as follows (using Go API):

func populateExistingTableMap(service *bigquery.Service, cloudCtx context.Context, projectId string) (map[string]map[string]bool, error) {
    tableMap := map[string]map[string]bool{}

    call := service.Datasets.List(projectId)
    //call.Fields("datasets/datasetReference")

    if err := call.Pages(cloudCtx, func(page *bigquery.DatasetList) error {
        for _, v := range page.Datasets {

            if tableMap[v.DatasetReference.DatasetId] == nil {
                tableMap[v.DatasetReference.DatasetId] = map[string]bool{}
            }

            table_call := service.Tables.List(projectId, v.DatasetReference.DatasetId)
            //table_call.Fields("tables/tableReference")

            if err := table_call.Pages(cloudCtx, func(page *bigquery.TableList) error {
                for _, t := range page.Tables {
                    tableMap[v.DatasetReference.DatasetId][t.TableReference.TableId] = true
                }
                return nil 
            }); err != nil {
                return errors.New("Error Parsing Table")
            }
        }
        return nil 
    }); err != nil {
        return tableMap, err
    }

    return tableMap, nil
}

For a project with about 5000 datasets, each with up to 10 tables, this code takes almost 15 minutes to return. Is there a faster way to iterate through the names of all existing datasets/tables? I have tried using the Fields method to return only the fields I need (you can see those lines commented out above), but that results in only 50 (exactly 50) of my datasets being returned.

Any ideas?

  • 写回答

1条回答 默认 最新

  • dongyuan9892 2016-08-12 15:33
    关注

    Here is an updated version of my code, with concurrency, that reduced the processing time from about 15 minutes to 3 minutes.

    func populateExistingTableMap(service *bigquery.Service, cloudCtx context.Context, projectId string) (map[string]map[string]bool, error) {
        tableMap = map[string]map[string]bool{}
    
        call := service.Datasets.List(projectId)
        //call.Fields("datasets/datasetReference")
    
        if err := call.Pages(cloudCtx, func(page *bigquery.DatasetList) error {
            var wg sync.WaitGroup
            wg.Add(len(page.Datasets))
            for _, v := range page.Datasets {
                if tableMap[v.DatasetReference.DatasetId] == nil {
                    tableMap[v.DatasetReference.DatasetId] = map[string]bool{}
                }
    
                go func(service *bigquery.Service, datasetID string, projectId string) {
                    defer wg.Done()
                    table_call := service.Tables.List(projectId, datasetID)
                    //table_call.Fields("tables/tableReference")
                    if err := table_call.Pages(cloudCtx, func(page *bigquery.TableList) error {
                        for _, t := range page.Tables {
                            tableMap[datasetID][t.TableReference.TableId] = true
                        }
                        return nil // NOTE: returning a non-nil error stops pagination.
                    }); err != nil {
                        // TODO: Handle error.
                        fmt.Println(err)
                    }
                }(service, v.DatasetReference.DatasetId, projectId)
            }
    
            wg.Wait()
            return nil // NOTE: returning a non-nil error stops pagination.
        }); err != nil {
            return tableMap, err
            // TODO: Handle error.
        }
    
        return tableMap, nil
    }
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 BP神经网络控制倒立摆
  • ¥20 要这个数学建模编程的代码 并且能完整允许出来结果 完整的过程和数据的结果
  • ¥15 html5+css和javascript有人可以帮吗?图片要怎么插入代码里面啊
  • ¥30 Unity接入微信SDK 无法开启摄像头
  • ¥20 有偿 写代码 要用特定的软件anaconda 里的jvpyter 用python3写
  • ¥20 cad图纸,chx-3六轴码垛机器人
  • ¥15 移动摄像头专网需要解vlan
  • ¥20 access多表提取相同字段数据并合并
  • ¥20 基于MSP430f5529的MPU6050驱动,求出欧拉角
  • ¥20 Java-Oj-桌布的计算