laijunlin_data 2023-09-20 08:30 采纳率: 58.3%
浏览 58

hive查hudi表报错

#hive查hudi的外部表tableName_rt表报错java.lang.IllegalArgumentException: HoodieRealtimeRecordReader can only work on RealtimeSplit
#版本情况
hive 3.1.2
flink 1.13.6
hudi 0.12.0
#情况详述
hudi的MOR表在hive中建了外部表,在hive客户端查询tableName_ro表就没有问题;如果只是简单的查询tableName_rt表也没有问题,但是如果查询tableName_rt使用了distinct,或者group by,或者order by等关键字就报错.

org.apache.hadoop.mapred.YarnChild: Exception running child : java.io.IOException: java.lang.reflect.InvocationTargetException
    at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
    at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
    at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:271)
    at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.<init>(HadoopShimsSecure.java:217)
    at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getRecordReader(HadoopShimsSecure.java:345)
    at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:719)
    at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.<init>(MapTask.java:175)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:444)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)
Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:257)
    ... 11 more
Caused by: java.lang.IllegalArgumentException: HoodieRealtimeRecordReader can only work on RealtimeSplit and not with hdfs://sgsdatacluster/user/hudi/warehouse/test_hudi/t3_hudi/9a2c9370-303e-4b14-b5e5-e88798551226_0-1-0_20230914170209306.parquet:0+434505
    at org.apache.hudi.common.util.ValidationUtils.checkArgument(ValidationUtils.java:40)
    at org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat.getRecordReader(HoodieParquetRealtimeInputFormat.java:61)
    at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.<init>(CombineHiveRecordReader.java:99)
    ... 16 more

#解决方法
在hive客户端设置一个参数
set hive.input.format=org.apache.hudi.hadoop.hive.HoodieCombineHiveInputFormat;
报错信息就变成

org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.NoSuchMethodError: org.apache.parquet.schema.Type.getLogicalTypeAnnotation()Lorg/apache/parquet/schema/LogicalTypeAnnotation;
    at org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:296)
    at org.apache.parquet.avro.AvroSchemaConverter.convertFields(AvroSchemaConverter.java:275)
    at org.apache.parquet.avro.AvroSchemaConverter.convert(AvroSchemaConverter.java:264)
    at org.apache.hudi.common.table.TableSchemaResolver.convertParquetSchemaToAvro(TableSchemaResolver.java:272)
    at org.apache.hudi.common.table.TableSchemaResolver.getTableAvroSchemaFromDataFile(TableSchemaResolver.java:117)
    at org.apache.hudi.common.table.TableSchemaResolver.hasOperationField(TableSchemaResolver.java:537)
    at org.apache.hudi.util.Lazy.get(Lazy.java:53)
    at org.apache.hudi.common.table.TableSchemaResolver.getTableSchemaFromLatestCommitMetadata(TableSchemaResolver.java:208)
    at org.apache.hudi.common.table.TableSchemaResolver.getTableAvroSchemaInternal(TableSchemaResolver.java:176)
    at org.apache.hudi.common.table.TableSchemaResolver.getTableAvroSchema(TableSchemaResolver.java:138)
    at org.apache.hudi.common.table.TableSchemaResolver.getTableAvroSchema(TableSchemaResolver.java:127)
    at org.apache.hudi.hadoop.realtime.AbstractRealtimeRecordReader.init(AbstractRealtimeRecordReader.java:90)
    at org.apache.hudi.hadoop.realtime.AbstractRealtimeRecordReader.<init>(AbstractRealtimeRecordReader.java:72)
    at org.apache.hudi.hadoop.realtime.RealtimeCompactedRecordReader.<init>(RealtimeCompactedRecordReader.java:62)
    at org.apache.hudi.hadoop.realtime.HoodieRealtimeRecordReader.constructRecordReader(HoodieRealtimeRecordReader.java:70)
    at org.apache.hudi.hadoop.realtime.HoodieRealtimeRecordReader.<init>(HoodieRealtimeRecordReader.java:47)
    at org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat.getRecordReader(HoodieParquetRealtimeInputFormat.java:74)
    at org.apache.hudi.hadoop.hive.HoodieCombineHiveInputFormat$HoodieCombineFileInputFormatShim.getRecordReader(HoodieCombineHiveInputFormat.java:975)
    at org.apache.hudi.hadoop.hive.HoodieCombineHiveInputFormat.getRecordReader(HoodieCombineHiveInputFormat.java:556)
    at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.<init>(MapTask.java:175)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:444)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)

  • 写回答

1条回答 默认 最新

  • 专家-赤兔[在线] 全栈领域优质创作者 2023-09-20 08:34
    关注

    引用 皆我百晓生 小程序回复内容作答:
    根据报错信息和问题描述,问题可能是由于Hive默认的输入格式不兼容Hudi的实时表(tableName_rt),导致查询时报错。尝试使用Hudi提供的HoodieCombineHiveInputFormat作为输入格式,可以解决这个问题。

    在Hive客户端中执行以下命令:

    set hive.input.format=org.apache.hudi.hadoop.hive.HoodieCombineHiveInputFormat;
    

    这样设置后,再次查询tableName_rt表,应该就不会出现报错了。

    评论

报告相同问题?

问题事件

  • 修改了问题 9月20日
  • 创建了问题 9月20日

悬赏问题

  • ¥15 下载ctorch报错,求解
  • ¥15 如何入门学习c语言,单片机
  • ¥15 idea 编辑语言的选择
  • ¥15 Windows下部署Asmjit
  • ¥15 请问双层规划模型的上下层目标函数不一致,是如何保证迭代收敛性的
  • ¥15 微信小程序 前端页面内容搜索
  • ¥15 cpu是如何判断当前指令已经执行完毕,然后去执行下条指令的
  • ¥15 安装visual studio2022时visualstudiosetup启动不了,闪退。问题代号0x0和0x1389
  • ¥30 java spring boot2.5.3版本websocket连不上
  • ¥15 angular js调外部链接查看pdf