2 jerryliun jerryliun 于 2017.09.06 16:27 提问

spark(自带hive)不能读取主子表的数据

【问题详细描述】
spark(自带hive)读取不了主子表的数据,非主表的数据可以读取。spark版本:spark-1.3.0-bin-hadoop2.4
使用的jar包:
spark-sequoiadb-1.12.jar
sequoiadb-driver-1.12.jar
hadoop-sequoiadb-1.12.jar
hive-sequoiadb-1.12.jar
postgresql-9.4-1201-jdbc41.jar
查询主表错误如下:
select * from test201607_cs.tb_order limit 1 ;

Error: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 16.0 failed 4 times, most recent failure: Lost task 0.3 in stage 16.0 (TID 362, sdb-223.3golden.hq): com.sequoiadb.exception.BaseException: errorType:SDB_DMS_CS_NOTEXIST,Collection space does not exist
Exception Detail:test201607_cs
at com.sequoiadb.base.Sequoiadb.getCollectionSpace(Sequoiadb.java:598)
at com.sequoiadb.hive.SdbReader.(SdbReader.java:145)
at com.sequoiadb.hive.SdbHiveInputFormat.getRecordReader(SdbHiveInputFormat.java:120)
at org.apache.spark.rdd.HadoopRDD$anon$1.(HadoopRDD.scala:236)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:212)
复制代码

查询非主表结果:
select * from test201607_cs.test_hive limit 1 ;

+----------+
| shop_id |
+----------+
| 10048 |
+----------+

1个回答

SequoiaDB_Official
SequoiaDB_Official   2017.09.06 16:30
已采纳

可以使用spark的连接器:

CREATE table st_order ( shop_id string, date string) using com.sequoiadb.spark OPTIONS ( host 'localhost:11810', collectionspace 'test201607_cs', collection 'st_order');
复制代码

注意date这种关键字需要使用``括起来。

Csdn user default icon
上传中...
上传图片
插入图片