weixin_39586915
weixin_39586915
2020-12-28 06:48

java.lang.NoClassDefFoundError: Could not initialize class org.xerial.snappy.Snappy

Facing issues when format.class used with parquet data format.

Stack trace [2019-03-22 07:45:51,991] INFO Flushing mem columnStore to file. allocated memory: 64 (org.apache.parquet.hadoop.InternalParquetRecordWriter:160) [2019-03-22 07:45:51,991] ERROR Error closing writer for uber7-0. Error: java.io.IOException: The file being written is in an invalid state. Probably caused by an error thrown previously. Current state: BLOCK (io.confluent.connect.hdfs.DataWriter:467) [2019-03-22 07:45:51,991] ERROR WorkerSinkTask{id=uberno-0} Task threw an uncaught and unrecoverable exception (org.apache.kafka.connect.runtime.WorkerTask:177) org.apache.kafka.connect.errors.ConnectException: Exiting WorkerSinkTask due to unrecoverable exception. at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:587) at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:323) at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:226) at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:194) at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:175) at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:219) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.NoClassDefFoundError: Could not initialize class org.xerial.snappy.Snappy at org.apache.parquet.hadoop.codec.SnappyCompressor.compress(SnappyCompressor.java:67) at org.apache.hadoop.io.compress.CompressorStream.compress(CompressorStream.java:81) at org.apache.hadoop.io.compress.CompressorStream.finish(CompressorStream.java:92) at org.apache.parquet.hadoop.CodecFactory$BytesCompressor.compress(CodecFactory.java:112) at org.apache.parquet.hadoop.ColumnChunkPageWriteStore$ColumnChunkPageWriter.writePage(ColumnChunkPageWriteStore.java:93) at org.apache.parquet.column.impl.ColumnWriterV1.writePage(ColumnWriterV1.java:150) at org.apache.parquet.column.impl.ColumnWriterV1.flush(ColumnWriterV1.java:238) at org.apache.parquet.column.impl.ColumnWriteStoreV1.flush(ColumnWriteStoreV1.java:121) at org.apache.parquet.hadoop.InternalParquetRecordWriter.flushRowGroupToStore(InternalParquetRecordWriter.java:167) at org.apache.parquet.hadoop.InternalParquetRecordWriter.close(InternalParquetRecordWriter.java:109) at org.apache.parquet.hadoop.ParquetWriter.close(ParquetWriter.java:302) at io.confluent.connect.hdfs.parquet.ParquetRecordWriterProvider$1.close(ParquetRecordWriterProvider.java:95) at io.confluent.connect.hdfs.TopicPartitionWriter.closeTempFile(TopicPartitionWriter.java:660) at io.confluent.connect.hdfs.TopicPartitionWriter.closeTempFile(TopicPartitionWriter.java:667) at io.confluent.connect.hdfs.TopicPartitionWriter.write(TopicPartitionWriter.java:425) at io.confluent.connect.hdfs.DataWriter.write(DataWriter.java:380) at io.confluent.connect.hdfs.HdfsSinkTask.put(HdfsSinkTask.java:123) at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:56

Plugin Path is set to : confluent-5.1.2/share/java

该提问来源于开源项目:confluentinc/kafka-connect-hdfs

  • 点赞
  • 写回答
  • 关注问题
  • 收藏
  • 复制链接分享
  • 邀请回答

8条回答

  • weixin_39639550 weixin_39639550 3月前

    command used to deploy the kafka-connect ./bin/connect-standalone etc/schema-registry/connect-avro-standalone.properties etc/kafka-connect-hdfs/quickstart-hdfs.properties

    quickstart-hdfs.properties

    name=hdfs-sink connector.class=io.confluent.connect.hdfs.HdfsSinkConnector tasks.max=1 topics=test_hdfsc hdfs.url=hdfs://10.75.172.215:8020/user/test flush.size=3 format.class=io.confluent.connect.hdfs.parquet.ParquetFormat partitioner.class=io.confluent.connect.hdfs.partitioner.HourlyPartitioner timezone=UTC locale=en

    error seen: [2019-03-22 14:46:28,799] ERROR Error closing writer for test_hdfsc-0. Error: java.io.IOException: The file being written is in an invalid state. Probably caused by an error thrown previously. Current state: BLOCK (io.confluent.connect.hdfs.DataWriter:460) [2019-03-22 14:46:28,799] ERROR WorkerSinkTask{id=hdfs-sink-0} Task threw an uncaught and unrecoverable exception (org.apache.kafka.connect.runtime.WorkerTask:177) org.apache.kafka.connect.errors.ConnectException: Exiting WorkerSinkTask due to unrecoverable exception. at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:587) at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:323) at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:226) at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:194) at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:175) at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:219) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.UnsatisfiedLinkError: /tmp/snappy--b2f81622-5139-46bc-9638-3ec3dd82be8f-libsnappyjava.so: /tmp/snappy--b2f81622-5139-46bc-9638-3ec3dd82be8f-libsnappyjava.so: failed to map segment from shared object: Operation not permitted at java.lang.ClassLoader$NativeLibrary.load(Native Method) at java.lang.ClassLoader.loadLibrary0(ClassLoader.java:1941) at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1824) at java.lang.Runtime.load0(Runtime.java:809) at java.lang.System.load(S

    after which gave 777 permissions to /tmp/snappy--b2f81622-5139-46bc-9638-3ec3dd82be8f-libsnappyjava.so

    and restarted the tasks facing the above error

    点赞 评论 复制链接分享
  • weixin_39968820 weixin_39968820 3月前

    , hdfs.url does not expect a path. The format expected is documented here - https://docs.confluent.io/current/connect/kafka-connect-hdfs/configuration_options.html#hdfs

    That said, this should not cause the java.lang.NoClassDefFoundError: Could not initialize class org.xerial.snappy.Snappy exception. Could you elaborate more on your environment? Also, does this libc6-compat workaround here work for you?

    点赞 评论 复制链接分享
  • weixin_39831567 weixin_39831567 3月前

    Hi I'm going through the same issue with kafka-connect-hdfs3.

    at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:321) at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:224) at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:192) at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:177) at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:227) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.NoClassDefFoundError: Could not initialize class org.xerial.snappy.Snappy at org.apache.parquet.hadoop.codec.SnappyCompressor.compress(SnappyCompressor.java:67) at org.apache.hadoop.io.compress.CompressorStream.compress(CompressorStream.java:81) at org.apache.hadoop.io.compress.CompressorStream.finish(CompressorStream.java:92) at org.apache.parquet.hadoop.CodecFactory$BytesCompressor.compress(CodecFactory.java:112) at org.apache.parquet.hadoop.ColumnChunkPageWriteStore$ColumnChunkPageWriter.writePage(ColumnChunkPageWriteStore.java:93) at org.apache.parquet.column.impl.ColumnWriterV1.writePage(ColumnWriterV1.java:150) at org.apache.parquet.column.impl.ColumnWriterV1.flush(ColumnWriterV1.java:238) at org.apache.parquet.column.impl.ColumnWriteStoreV1.flush(ColumnWriteStoreV1.java:121) at org.apache.parquet.hadoop.InternalParquetRecordWriter.flushRowGroupToStore(InternalParquetRecordWriter.java:167) at org.apache.parquet.hadoop.InternalParquetRecordWriter.close(InternalParquetRecordWriter.java:109) at org.apache.parquet.hadoop.ParquetWriter.close(ParquetWriter.java:302) at io.confluent.connect.hdfs3.parquet.ParquetRecordWriterProvider$1.close(ParquetRecordWriterProvider.java:101) at io.confluent.connect.hdfs3.TopicPartitionWriter.closeTempFile(TopicPartitionWriter.java:687) at io.confluent.connect.hdfs3.TopicPartitionWriter.closeTempFile(TopicPartitionWriter.java:694) at io.confluent.connect.hdfs3.TopicPartitionWriter.write(TopicPartitionWriter.java:381) at io.confluent.connect.hdfs3.DataWriter.write(DataWriter.java:359) at io.confluent.connect.hdfs3.Hdfs3SinkTask.put(Hdfs3SinkTask.java:108) at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:538) ... 10 more [2019-12-16 17:54:14,985] ERROR WorkerSinkTask{id=ds2_sink_book_pnlversion_v1-0} Task is being killed and will not recover until manually restarted (org.apache.kafka.connect.runtime.WorkerTask:180) [2019-12-16 17:54:14,985] INFO [Consumer clientId=connector-consumer-ds2_sink_book_pnlversion_v1-0, groupId=connect-ds2_sink_book_pnlversion_v1] Member connector-consumer-ds2_sink_book_pnlversion_v1-0-d2197886-cf74-4af6-8be9-d7d74f7b3a06 sending LeaveGroup request to coordinator 9092 (id: 2147483647 rack: null) (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:879) [2019-12-16 17:54:14,987] INFO Publish thread interrupted for client_id=connector-consumer-ds2_sink_book_pnlversion_v1-0 client_type=CONSUMER session= cluster=NV4qXOVlRtOAedA45AHcXg group=connect-ds2_sink_book_pnlversion_v1 (io.confluent.monitoring.clients.interceptor.MonitoringInterceptor:285)

    点赞 评论 复制链接分享
  • weixin_39639550 weixin_39639550 3月前

    Issue is with snappy libraries that are being loaded into /tmp /tmp by default has noexec permission so it can't load libraries from /tmp path

    to resolve this issue 2 approaches can be used : 1. -Dorg.xerial.snappy.tempdir=(path to new temp which has exec permissions) 2. mount the /tmp with exec permissions cmd : mount -o remount,exec /tmp

    点赞 评论 复制链接分享
  • weixin_39831567 weixin_39831567 3月前

    Thank you. will try it

    点赞 评论 复制链接分享
  • weixin_39968820 weixin_39968820 3月前

    did the approaches provided by work for you?

    点赞 评论 复制链接分享
  • weixin_39865440 weixin_39865440 3月前

    it did not work for me.... :(

    点赞 评论 复制链接分享
  • weixin_39831567 weixin_39831567 3月前
    点赞 评论 复制链接分享

为你推荐