各位大神刚开始学spark sql想处理json数据,一般的json数据没问题,但是当json串中有json嵌套数组时,就不太清楚怎样获取这个数据里每一项的数据,请各位指点。格式如
{"name":"Yin","address":[{"city":"Columbus","state":"防守打法"},{"city":"Columbus2","state":"防守打法"}]}我想获取address中的每一个数据项,应该怎么弄比较好?谢谢!
spark sql中处理json嵌套数组的方法
- 写回答
- 好问题 0 提建议
- 追加酬金
- 关注问题
- 邀请回答
-
2条回答 默认 最新
- liangxiaoxia 2017-06-19 06:59关注
别沉了,补个代码
val conf =new SparkConf().setAppName("test2").setMaster("local")
val sc =new SparkContext(conf)
var sqlContext= new SQLContext(sc)var anotherPeopleRDD=sc.parallelize("""{"name":"Yin","address":[{"city":"Columbus","state":"http://www.tom.com"},{"city":"Columbus2","state":"http://www.tom.com.cn"}]}"""::Nil)
sqlContext.udf.register("testaddress",(arr:Array[Row])=>{
})val anotherPeople = sqlContext.jsonRDD(anotherPeopleRDD)
anotherPeople.printSchema
anotherPeople.registerTempTable("people")
val pairs = sqlContext.sql("select testaddress(address) from people ")
pairs.collect()我想用udf 但是总报错: java.lang.ClassCastException: scala.collection.mutable.WrappedArray$ofRef cannot be cast to [Lorg.apache.spark.sql.Row; at com.dwnews.DMP.ETL.service.FileFilterService$$anonfun$test2$1.apply(FileFilterService.scala:100) at org.apache.spark.sql.catalyst.expressions.ScalaUDF$$anonfun$2.apply(ScalaUDF.scala:75) at org.apache.spark.sql.catalyst.expressions.ScalaUDF$$anonfun$2.apply(ScalaUDF.scala:74) at org.apache.spark.sql.catalyst.expressions.ScalaUDF.eval(ScalaUDF.scala:964) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificMutableProjection.apply(Unknown Source) at org.apache.spark.sql.execution.Project$$anonfun$1$$anonfun$apply$2.apply(basicOperators.scala:55) at org.apache.spark.sql.execution.Project$$anonfun$1$$anonfun$apply$2.apply(basicOperators.scala:53) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47) at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273) at scala.collection.AbstractIterator.to(Iterator.scala:1157) at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265) at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157) at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252) at scala.collection.AbstractIterator.toArray(Iterator.scala:1157) at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:909) at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:909) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1850) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1850) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:88) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)
解决 无用评论 打赏 举报
悬赏问题
- ¥15 #MATLAB仿真#车辆换道路径规划
- ¥15 java 操作 elasticsearch 8.1 实现 索引的重建
- ¥15 数据可视化Python
- ¥15 要给毕业设计添加扫码登录的功能!!有偿
- ¥15 kafka 分区副本增加会导致消息丢失或者不可用吗?
- ¥15 微信公众号自制会员卡没有收款渠道啊
- ¥100 Jenkins自动化部署—悬赏100元
- ¥15 关于#python#的问题:求帮写python代码
- ¥20 MATLAB画图图形出现上下震荡的线条
- ¥15 关于#windows#的问题:怎么用WIN 11系统的电脑 克隆WIN NT3.51-4.0系统的硬盘