m0_37608723 2018-08-09 08:14 采纳率: 40%
浏览 6067
已采纳

多线程向flink集群提交任务失败

Caused by: org.apache.flink.client.program.ProgramInvocationException: The program execution failed: Could not upload the program's JAR files to the JobManager.
at org.apache.flink.client.program.ClusterClient.runDetached(ClusterClient.java:454)
at org.apache.flink.client.program.StandaloneClusterClient.submitJob(StandaloneClusterClient.java:99)
at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:400)
at org.apache.flink.client.program.DetachedEnvironment.finalizeExecute(DetachedEnvironment.java:76)
at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:345)
... 14 common frames omitted
Caused by: org.apache.flink.runtime.client.JobSubmissionException: Could not upload the program's JAR files to the JobManager.
at org.apache.flink.runtime.client.JobClient.submitJobDetached(JobClient.java:410)
at org.apache.flink.client.program.ClusterClient.runDetached(ClusterClient.java:451)
... 19 common frames omitted
Caused by: java.io.IOException: Could not retrieve the JobManager's blob port.
at org.apache.flink.runtime.blob.BlobClient.uploadJarFiles(BlobClient.java:745)
at org.apache.flink.runtime.jobgraph.JobGraph.uploadUserJars(JobGraph.java:565)
at org.apache.flink.runtime.client.JobClient.submitJobDetached(JobClient.java:407)
... 20 common frames omitted
Caused by: java.io.IOException: PUT operation failed: Could not transfer error message
at org.apache.flink.runtime.blob.BlobClient.putInputStream(BlobClient.java:512)
at org.apache.flink.runtime.blob.BlobClient.put(BlobClient.java:374)
at org.apache.flink.runtime.blob.BlobClient.uploadJarFiles(BlobClient.java:771)
at org.apache.flink.runtime.blob.BlobClient.uploadJarFiles(BlobClient.java:740)
... 22 common frames omitted
Caused by: java.io.IOException: Could not transfer error message
at org.apache.flink.runtime.blob.BlobClient.readExceptionFromStream(BlobClient.java:799)
at org.apache.flink.runtime.blob.BlobClient.receivePutResponseAndCompare(BlobClient.java:537)
at org.apache.flink.runtime.blob.BlobClient.putInputStream(BlobClient.java:508)
... 25 common frames omitted
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.ipc.RemoteException
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.flink.util.InstantiationUtil$ClassLoaderObjectInputStream.resolveClass(InstantiationUtil.java:64)
at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1613)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
at java.io.ObjectInputStream.defaultReadObject(ObjectInputStream.java:501)
at java.lang.Throwable.readObject(Throwable.java:914)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1058)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1900)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
at org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:290)
at org.apache.flink.runtime.blob.BlobClient.readExceptionFromStream(BlobClient.java:795)
... 27 common frames omitted

  • 写回答

2条回答 默认 最新

  • m0_37608723 2018-09-11 02:45
    关注

    自己回复一下 对于flink提交任务时 会将任务对应的jar文件上传至远程主机(如何上传因集群部署方式不同而不同),最终存储到hdfs上,然后taskmanager会去hdfs上下载此文件。
    上传文件时,会生成对应的文件名,而文件名是根据jar包的字节码生成的(极端的说,即便jar包对应的源代码中多了一个空格,生成的文件名都不会相同)。
    所以,同一个jar会生成同样的文件名,而它又在同样的路径中,这时就会出现多线程对同一文件读写,典型的多线程访问同一资源的问题。这也就是导致上述问题的根源。

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
  • m0_37608723 2018-08-09 08:25
    关注

    多线程提交10个任务,其中大部分成功,部分失败。以上是失败的报错信息。我对着源码看了,问题出在向jobclient类中:当blobClient.put(is)时
    (其中is为jar对应的inputstream),使用一个与blob server连接的socket向集群提交jar。
    提交完成后会用这个socket获取返回数据:final InputStream is = this.socket.getInputStream();,然而这个返回的状态码为1,其含义为:Internal code to identify an erroneous operation.。然后又调用readExceptionFromStream方法解析报错信息,这个解析报错的方法也报错了: java.lang.ClassNotFoundException: org.apache.hadoop.ipc.RemoteException。有没有大神指点一下,给指个方向。感谢!!!

    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥15 Google Chrome 所有页面崩溃,三种解决方案都没有解决,我崩溃了
  • ¥18 如何用c++编写数学规律题
  • ¥20 使用uni-app发起网络请求,获取重定向302返回的cookie
  • ¥20 手机外部浏览器拉起微信小程序支付 (相关搜索:微信小程序)
  • ¥20 怎样通过一个网址找到其他同样模版的网址
  • ¥30 XIAO esp32c3 读取FDC2214的数据
  • ¥15 在工控机(Ubuntu系统)上外接USB蓝牙硬件进行蓝牙通信
  • ¥15 关于PROCEDURE和FUNCTION的问题
  • ¥100 webapi的部署(标签-服务器)
  • ¥20 怎么加快手机软件内部计时的时间(关键词-日期时间)