2401_85582445 2024-08-29 01:06 采纳率: 0%
浏览 9
已结题

flink on yarn 启动报错,如何解决?

flink on yarn报错
开启Hadoop集群,在yarn上以per job模式运行

执行命令

[root@master bin]# /opt/module/flink-1.14.0/bin/flink run -m flink-cluster -yjm 1024 -ytm 1024 /opt/module/flink-1.14.0/examples/batch/WordCount.jar 

报错信息

 The program finished with the following exception:

org.apache.flink.client.program.ProgramInvocationException: The main method caused an error: org.apache.flink.util.concurrent.FutureUtils$RetryException: Could not complete the operation. Number of retries has been exhausted.
    at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:372)
    at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:222)
    at org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:114)
    at org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:812)
    at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:246)
    at org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:1054)
    at org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1132)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
    at org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
    at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1132)
Caused by: java.util.concurrent.ExecutionException: org.apache.flink.util.concurrent.FutureUtils$RetryException: Could not complete the operation. Number of retries has been exhausted.
    at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
    at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
    at org.apache.flink.client.program.ContextEnvironment.getJobExecutionResult(ContextEnvironment.java:117)
    at org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:74)
    at org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:942)
    at org.apache.flink.api.java.DataSet.collect(DataSet.java:417)
    at org.apache.flink.api.java.DataSet.print(DataSet.java:1748)
    at org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:96)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:355)
    ... 11 more
Caused by: org.apache.flink.util.concurrent.FutureUtils$RetryException: Could not complete the operation. Number of retries has been exhausted.
    at org.apache.flink.util.concurrent.FutureUtils.lambda$retryOperationWithDelay$9(FutureUtils.java:395)
    at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
    at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
    at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
    at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
    at org.apache.flink.runtime.rest.RestClient.lambda$submitRequest$1(RestClient.java:476)
    at org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:578)
    at org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:571)
    at org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:550)
    at org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:491)
    at org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:616)
    at org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:609)
    at org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:117)
    at org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.fulfillConnectPromise(AbstractNioChannel.java:321)
    at org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:337)
    at org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:707)
    at org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:655)
    at org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:581)
    at org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)
    at org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
    at org.apache.flink.shaded.netty4.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.CompletionException: org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: master/192.168.186.101:40850
    at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
    at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
    at java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:943)
    at java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:926)
    ... 19 more
Caused by: org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: master/192.168.186.101:40850
Caused by: java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
    at org.apache.flink.shaded.netty4.io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:330)
    at org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:334)
    at org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:707)
    at org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:655)
    at org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:581)
    at org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)
    at org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
    at org.apache.flink.shaded.netty4.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
    at java.lang.Thread.run(Thread.java:748)
  • 写回答

1条回答 默认 最新

  • 清水白石008 2024-08-29 05:24
    关注
    🕒上午好,题主!目前:🟢[在线]
    📢参考GPT-4o mini 模型、由清水白石008整理提供!
    📣今日金句:“生活中最重要的不是你所处的位置,而是你所朝的方向。” — 奥利弗·温德尔·霍姆斯
    

    根据你提供的错误信息,问题主要是由于 Flink 客户端无法连接到指定的 YARN ResourceManager 或 TaskManager。具体的错误信息是 Connection refused: master/192.168.186.101:40850,这表明在尝试连接到该地址时被拒绝了。以下是一些可能的解决方案和排查步骤:

    1. 检查 YARN 集群状态

    确保你的 YARN 集群正在运行,并且 ResourceManager 和 TaskManager 都已启动。你可以通过访问 YARN 的 Web UI(通常是 http://<ResourceManager_IP>:8088)来检查集群状态。

    2. 检查网络连接

    • 确保你在运行 Flink 的机器上能够访问 YARN ResourceManager 的 IP 地址和端口。可以使用 telnet 命令来测试连接:
      telnet 192.168.186.101 40850
      
      如果连接失败,可能是网络问题或防火墙设置导致的。

    3. 检查防火墙设置

    确保防火墙没有阻止 Flink 和 YARN 之间的通信。你可以暂时禁用防火墙进行测试,或者确保相关端口(如 8088、40850 等)已开放。

    4. 检查 Flink 配置

    确保 Flink 的配置文件(如 flink-conf.yaml)中的 YARN 配置正确。特别是以下配置项:

    • jobmanager.address:确保它指向正确的 ResourceManager 地址。
    • yarn.application.classpath:确保包含了所有必要的类路径。

    5. 检查日志文件

    查看 Flink 和 YARN 的日志文件,可能会提供更多的错误信息。Flink 的日志通常位于 flink-<version>/log 目录下,而 YARN 的日志可以通过 YARN Web UI 访问。

    6. 资源分配

    确保 YARN 集群有足够的资源(内存、CPU)来运行 Flink 作业。你可以在 YARN Web UI 中查看资源使用情况。

    7. 重新启动服务

    如果以上步骤都没有解决问题,尝试重新启动 YARN ResourceManager 和 TaskManager,或者重启整个 YARN 集群。

    8. 检查 Flink 版本

    确保你使用的 Flink 版本与 YARN 版本兼容。某些版本之间可能存在不兼容的问题。

    9. 运行示例作业

    尝试运行 Flink 自带的示例作业,看看是否能成功运行。这可以帮助你确认 Flink 和 YARN 的基本配置是否正常。

    如果在尝试以上步骤后仍然遇到问题,请提供更多的上下文信息,例如 YARN 和 Flink 的版本、配置文件的相关内容等,以便更好地帮助你解决问题。

    评论

报告相同问题?

问题事件

  • 已结题 (查看结题原因) 8月29日
  • 创建了问题 8月29日

悬赏问题

  • ¥15 如何在vue.config.js中读取到public文件夹下window.APP_CONFIG.API_BASE_URL的值
  • ¥50 浦育平台scratch图形化编程
  • ¥20 求这个的原理图 只要原理图
  • ¥15 vue2项目中,如何配置环境,可以在打完包之后修改请求的服务器地址
  • ¥20 微信的店铺小程序如何修改背景图
  • ¥15 UE5.1局部变量对蓝图不可见
  • ¥15 一共有五道问题关于整数幂的运算还有房间号码 还有网络密码的解答?(语言-python)
  • ¥20 sentry如何捕获上传Android ndk 崩溃
  • ¥15 在做logistic回归模型限制性立方条图时候,不能出完整图的困难
  • ¥15 G0系列单片机HAL库中景园gc9307液晶驱动芯片无法使用硬件SPI+DMA驱动,如何解决?