douwo1517 2017-03-23 20:26
浏览 58
已采纳

BigQuery中的数据库设计

I have a main table called "Acquisition" with multiple columns thatwould be referencing other tables (ex: "Source", "Application", etc. - For example, "Source" would have multiple possible values that wouldbe used in multiple rows of the "Acquisition" table). What bothers mea bit is that the way is that the rows of the "Acquisition" tablewould return datas that would like this:

id > 1 ; value > 23.4 ; source_id > 1 ; application_id > 3 ;platform_id > 1 ; country_id > 1 ; etc.

Do you think there's another way to design it to make it more readable / user-friendly ?

Here's an extract of the code of the schema:

acquisitionSchema = bigquery.Schema {
    &bigquery.FieldSchema{Name: "id", Required: true, Type: bigquery.StringFieldType},
    &bigquery.FieldSchema{Name: "value", Required: true, Type: bigquery.FloatFieldType},
    &bigquery.FieldSchema{Name: "source_id", Required: true, Type: bigquery.StringFieldType},
    &bigquery.FieldSchema{Name: "application_id", Required: true, Type: bigquery.StringFieldType},
    &bigquery.FieldSchema{Name: "platform_id", Required: true, Type: bigquery.StringFieldType},
    &bigquery.FieldSchema{Name: "country_id", Required: true, Type: bigquery.StringFieldType},
    &bigquery.FieldSchema{Name: "adtype_id", Required: true, Type: bigquery.StringFieldType},
    &bigquery.FieldSchema{Name: "date", Required: true, Type: bigquery.dateFieldType},
    &bigquery.FieldSchema{Name: "download", Required: false, Type: bigquery.IntegerFieldType}   } 

sourceSchema = bigquery.Schema {
    &bigquery.FieldSchema{Name: "id", Required: true, Type: bigquery.StringFieldType},
    &bigquery.FieldSchema{Name: "value", Required: true, Type: bigquery.StringFieldType},
}

I thought of directly putting the value of the source, platform, etc. but it might get messy as I get my data from multiple sources through APIs unless I make all the necessary controls in my code.

Thanks !

  • 写回答

1条回答 默认 最新

  • dongliang1873 2017-03-23 20:56
    关注

    Usually we do a RECORD that has two columns (id,name)

    -country
     |id
     |name
    

    this way in our query we can use country.id to query by integer, or country.name to display the value for quick inspection.

    Since nowadays storage is cheap, we can afford storing the literal representation in every column. Since BQ is append-only by design, and we usually read most recent row, that already contains the fresh value if the name meanwhile suffered a change. Using LAST_VALUE function we can always pick the last record that holds the last name.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 C#算法问题, 不知道怎么处理这个数据的转换
  • ¥15 YoloV5 第三方库的版本对照问题
  • ¥15 请完成下列相关问题!
  • ¥15 drone 推送镜像时候 purge: true 推送完毕后没有删除对应的镜像,手动拷贝到服务器执行结果正确在样才能让指令自动执行成功删除对应镜像,如何解决?
  • ¥15 求daily translation(DT)偏差订正方法的代码
  • ¥15 js调用html页面需要隐藏某个按钮
  • ¥15 ads仿真结果在圆图上是怎么读数的
  • ¥20 Cotex M3的调试和程序执行方式是什么样的?
  • ¥20 java项目连接sqlserver时报ssl相关错误
  • ¥15 一道python难题3