timruning 2016-09-15 15:52 采纳率: 50%
浏览 2835

Spark读取错误PrematureEOFfrominputStream

:主要问题java.io.EOFException: Premature EOF from inputStream
使用textFile或者newAPIHadoopFile都出现这个错误
写spark读取数据的时候一直报这个错误。
连count,repartition都过不去。数据读的比平常慢的多。
看数据文件,应该是很均匀的,应该不是数据倾斜的问题了吧。
下面是报错信息:

 16/09/15 23:27:57 ERROR yarn.ApplicationMaster: User class threw exception: org.apache.spark.SparkException: Job aborted due to stage failure: Task 41 in stage 0.0 failed 4 times, most recent failure: Lost task 41.3 in stage 0.0 (TID 5736, dn076179.heracles.sohuno.com): java.io.EOFException: Premature EOF from inputStream
    at com.hadoop.compression.lzo.LzopInputStream.readFully(LzopInputStream.java:75)
    at com.hadoop.compression.lzo.LzopInputStream.readHeader(LzopInputStream.java:114)
    at com.hadoop.compression.lzo.LzopInputStream.<init>(LzopInputStream.java:54)
    at com.hadoop.compression.lzo.LzopCodec.createInputStream(LzopCodec.java:83)
    at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.initialize(LineRecordReader.java:102)
    at org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:133)
    at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:104)
    at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:66)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:70)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
    at org.apache.spark.scheduler.Task.run(Task.scala:70)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:744)
    Driver stacktrace:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 41 in stage 0.0 failed 4 times, most recent failure: Lost task 41.3 in stage 0.0 (TID 5736, dn076179.heracles.sohuno.com): java.io.EOFException: Premature EOF from inputStream
    at com.hadoop.compression.lzo.LzopInputStream.readFully(LzopInputStream.java:75)
    at com.hadoop.compression.lzo.LzopInputStream.readHeader(LzopInputStream.java:114)
    at com.hadoop.compression.lzo.LzopInputStream.<init>(LzopInputStream.java:54)
    at com.hadoop.compression.lzo.LzopCodec.createInputStream(LzopCodec.java:83)
    at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.initialize(LineRecordReader.java:102)
    at org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:133)
    at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:104)
    at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:66)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:70)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
    at org.apache.spark.scheduler.Task.run(Task.scala:70)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:744)

  • 写回答

1条回答 默认 最新

  • devmiao 2016-09-15 15:56
    关注
    评论

报告相同问题?

悬赏问题

  • ¥15 如何在scanpy上做差异基因和通路富集?
  • ¥20 关于#硬件工程#的问题,请各位专家解答!
  • ¥15 关于#matlab#的问题:期望的系统闭环传递函数为G(s)=wn^2/s^2+2¢wn+wn^2阻尼系数¢=0.707,使系统具有较小的超调量
  • ¥15 FLUENT如何实现在堆积颗粒的上表面加载高斯热源
  • ¥30 截图中的mathematics程序转换成matlab
  • ¥15 动力学代码报错,维度不匹配
  • ¥15 Power query添加列问题
  • ¥50 Kubernetes&Fission&Eleasticsearch
  • ¥15 報錯:Person is not mapped,如何解決?
  • ¥15 c++头文件不能识别CDialog