m0_37608723
2018-08-09 08:14
采纳率: 40%
浏览 5.9k
已采纳

多线程向flink集群提交任务失败

Caused by: org.apache.flink.client.program.ProgramInvocationException: The program execution failed: Could not upload the program's JAR files to the JobManager.
at org.apache.flink.client.program.ClusterClient.runDetached(ClusterClient.java:454)
at org.apache.flink.client.program.StandaloneClusterClient.submitJob(StandaloneClusterClient.java:99)
at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:400)
at org.apache.flink.client.program.DetachedEnvironment.finalizeExecute(DetachedEnvironment.java:76)
at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:345)
... 14 common frames omitted
Caused by: org.apache.flink.runtime.client.JobSubmissionException: Could not upload the program's JAR files to the JobManager.
at org.apache.flink.runtime.client.JobClient.submitJobDetached(JobClient.java:410)
at org.apache.flink.client.program.ClusterClient.runDetached(ClusterClient.java:451)
... 19 common frames omitted
Caused by: java.io.IOException: Could not retrieve the JobManager's blob port.
at org.apache.flink.runtime.blob.BlobClient.uploadJarFiles(BlobClient.java:745)
at org.apache.flink.runtime.jobgraph.JobGraph.uploadUserJars(JobGraph.java:565)
at org.apache.flink.runtime.client.JobClient.submitJobDetached(JobClient.java:407)
... 20 common frames omitted
Caused by: java.io.IOException: PUT operation failed: Could not transfer error message
at org.apache.flink.runtime.blob.BlobClient.putInputStream(BlobClient.java:512)
at org.apache.flink.runtime.blob.BlobClient.put(BlobClient.java:374)
at org.apache.flink.runtime.blob.BlobClient.uploadJarFiles(BlobClient.java:771)
at org.apache.flink.runtime.blob.BlobClient.uploadJarFiles(BlobClient.java:740)
... 22 common frames omitted
Caused by: java.io.IOException: Could not transfer error message
at org.apache.flink.runtime.blob.BlobClient.readExceptionFromStream(BlobClient.java:799)
at org.apache.flink.runtime.blob.BlobClient.receivePutResponseAndCompare(BlobClient.java:537)
at org.apache.flink.runtime.blob.BlobClient.putInputStream(BlobClient.java:508)
... 25 common frames omitted
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.ipc.RemoteException
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.flink.util.InstantiationUtil$ClassLoaderObjectInputStream.resolveClass(InstantiationUtil.java:64)
at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1613)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
at java.io.ObjectInputStream.defaultReadObject(ObjectInputStream.java:501)
at java.lang.Throwable.readObject(Throwable.java:914)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1058)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1900)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
at org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:290)
at org.apache.flink.runtime.blob.BlobClient.readExceptionFromStream(BlobClient.java:795)
... 27 common frames omitted

  • 写回答
  • 关注问题
  • 收藏
  • 邀请回答

2条回答 默认 最新

  • m0_37608723 2018-09-11 02:45
    已采纳

    自己回复一下 对于flink提交任务时 会将任务对应的jar文件上传至远程主机(如何上传因集群部署方式不同而不同),最终存储到hdfs上,然后taskmanager会去hdfs上下载此文件。
    上传文件时,会生成对应的文件名,而文件名是根据jar包的字节码生成的(极端的说,即便jar包对应的源代码中多了一个空格,生成的文件名都不会相同)。
    所以,同一个jar会生成同样的文件名,而它又在同样的路径中,这时就会出现多线程对同一文件读写,典型的多线程访问同一资源的问题。这也就是导致上述问题的根源。

    已采纳该答案
    打赏 评论
  • m0_37608723 2018-08-09 08:25

    多线程提交10个任务,其中大部分成功,部分失败。以上是失败的报错信息。我对着源码看了,问题出在向jobclient类中:当blobClient.put(is)时
    (其中is为jar对应的inputstream),使用一个与blob server连接的socket向集群提交jar。
    提交完成后会用这个socket获取返回数据:final InputStream is = this.socket.getInputStream();,然而这个返回的状态码为1,其含义为:Internal code to identify an erroneous operation.。然后又调用readExceptionFromStream方法解析报错信息,这个解析报错的方法也报错了: java.lang.ClassNotFoundException: org.apache.hadoop.ipc.RemoteException。有没有大神指点一下,给指个方向。感谢!!!

    3 打赏 评论

相关推荐 更多相似问题