Python用shc框架将dataframe写入Hbase int型会乱码,而且读回df同样乱码
df_test_HBase = sql_sc.read.format('jdbc').options(url=jdbc_url_test,driver=jdbc_driver,dbtable='testHBase').load()
df_test_HBase.createOrReplaceTempView("test_HBase")
df_cast_HBase = sql_sc.sql("select CAST(id as String) id,name,CAST(age as String) age,CAST(gender as String) gender,cat,tag,level from test_HBase")
df_cast_HBase.show()
dep = "org.apache.spark.sql.execution.datasources.hbase"
catalog = """{
"table":{"namespace":"default", "name":"teacher", "tableCoder":"PrimitiveType"},
"rowkey":"key",
"columns":{
"id":{"cf":"rowkey", "col":"key", "type":"string"},
"name":{"cf":"teacherBase", "col":"name", "type":"string"},
"age":{"cf":"teacherBase", "col":"age", "type":"string"},
"gender":{"cf":"teacherBase", "col":"gender","type":"string"},
"cat":{"cf":"teacherDetails", "col":"cat","type":"string"},
"tag":{"cf":"teacherDetails", "col":"tag", "type":"string"},
"level":{"cf":"teacherDetails", "col":"level","type":"string"} }
} """
df_cast_HBase.write.options(catalog=catalog,newTable="5").format(dep).save()
我目前只能通过Cast函数将int转成String并且把catalog中type为int也改成string后写入HBase才不乱码,但只是治标不治本,求大神给个治本的解决办法!!
前后对比图: