jerryyuen 2019-09-09 15:45 采纳率: 0%
浏览 3940

flink搭建standalone模式集群,jobmanager会自动挂掉,只有一直刷的warn日志

flink搭建standalone模式集群,启动后任务提交跟运行正常,gc情况观察了一下也正常,但是jobmanager到晚上会自动挂掉,而且一直刷的warn日志。

flink版本:1.7.2 三台机器,web界面信息正常。

问题:jobmanager会挂掉,跟这个日志是否有关呢?我希望集群可以稳定跑下去,目前任务只是对接kafka与redis。

warn日志如下:

09-06 14:00:23,430 WARN  akka.remote.transport.netty.NettyTransport                    - Remote connection to [null] failed with java.net.ConnectException: Connection refused: localhost/127.0.0.1:63408
2019-09-06 14:00:23,431 WARN  akka.remote.ReliableDeliverySupervisor                        - Association with remote system [akka.tcp://flink-metrics@localhost:63408] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink-metrics@localhost:63408]] Caused by: [Connection refused: localhost/127.0.0.1:63408]
2019-09-06 14:00:23,431 WARN  akka.remote.transport.netty.NettyTransport                    - Remote connection to [null] failed with java.net.ConnectException: Connection refused: localhost/127.0.0.1:30060
2019-09-06 14:00:23,431 WARN  akka.remote.ReliableDeliverySupervisor                        - Association with remote system [akka.tcp://flink-metrics@localhost:30060] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink-metrics@localhost:30060]] Caused by: [Connection refused: localhost/127.0.0.1:30060]

集群启动日志如下:

2019-09-06 13:50:33,581 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         - --------------------------------------------------------------------------------
2019-09-06 13:50:33,582 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -  Starting StandaloneSessionClusterEntrypoint (Version: 1.7.2, Rev:ceba8af, Date:11.02.2019 @ 14:17:09 UTC)
2019-09-06 13:50:33,582 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -  OS current user: apps
2019-09-06 13:50:33,816 WARN  org.apache.hadoop.util.NativeCodeLoader                       - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2019-09-06 13:50:33,945 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -  Current Hadoop/Kerberos user: apps
2019-09-06 13:50:33,945 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -  JVM: Java HotSpot(TM) 64-Bit Server VM - Oracle Corporation - 1.8/25.161-b12
2019-09-06 13:50:33,945 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -  Maximum heap size: 981 MiBytes
2019-09-06 13:50:33,945 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -  JAVA_HOME: /apps/svr/jdk1.8.0_161
2019-09-06 13:50:33,947 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -  Hadoop version: 2.6.5
2019-09-06 13:50:33,947 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -  JVM Options:
2019-09-06 13:50:33,948 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -     -Xms1024m
2019-09-06 13:50:33,948 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -     -Xmx1024m
2019-09-06 13:50:33,948 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -     -Dlog.file=/home/apps/jfy/flink-1.7.2/log/flink-apps-standalonesession-6-arch-dev-rmq.log
2019-09-06 13:50:33,948 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -     -Dlog4j.configuration=file:/home/apps/jfy/flink-1.7.2/conf/log4j.properties
2019-09-06 13:50:33,948 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -     -Dlogback.configurationFile=file:/home/apps/jfy/flink-1.7.2/conf/logback.xml
2019-09-06 13:50:33,948 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -  Program Arguments:
2019-09-06 13:50:33,948 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -     --configDir
2019-09-06 13:50:33,948 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -     /home/apps/jfy/flink-1.7.2/conf
2019-09-06 13:50:33,948 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -     --executionMode
2019-09-06 13:50:33,948 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -     cluster
2019-09-06 13:50:33,948 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -  Classpath: /home/apps/jfy/flink-1.7.2/lib/flink-python_2.11-1.7.2.jar:/home/apps/jfy/flink-1.7.2/lib/flink-shaded-hadoop2-uber-1.7.2.jar:/home/apps/jfy/flink-1.7.2/lib/log4j-1.2.17.jar:/home/apps/jfy/flink-1.7.2/lib/slf4j-log4j12-1.7.15.jar:/home/apps/jfy/flink-1.7.2/lib/flink-dist_2.11-1.7.2.jar:::
2019-09-06 13:50:33,948 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         - --------------------------------------------------------------------------------
2019-09-06 13:50:33,949 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         - Registered UNIX signal handlers for [TERM, HUP, INT]
2019-09-06 13:50:33,959 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: jobmanager.rpc.address, 172.31.50.59
2019-09-06 13:50:33,960 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: jobmanager.rpc.port, 6123
2019-09-06 13:50:33,960 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: jobmanager.heap.size, 1024m
2019-09-06 13:50:33,960 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: taskmanager.heap.size, 1024m
2019-09-06 13:50:33,960 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: taskmanager.numberOfTaskSlots, 1
2019-09-06 13:50:33,960 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: parallelism.default, 1
2019-09-06 13:50:33,960 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: rest.port, 8081
2019-09-06 13:50:33,973 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         - Starting StandaloneSessionClusterEntrypoint.
2019-09-06 13:50:33,973 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         - Install default filesystem.
2019-09-06 13:50:33,983 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         - Install security context.
2019-09-06 13:50:34,016 INFO  org.apache.flink.runtime.security.modules.HadoopModule        - Hadoop user set to apps (auth:SIMPLE)
2019-09-06 13:50:34,030 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         - Initializing cluster services.
2019-09-06 13:50:34,191 INFO  org.apache.flink.runtime.rpc.akka.AkkaRpcServiceUtils         - Trying to start actor system at 172.31.50.59:6123
2019-09-06 13:50:34,520 INFO  akka.event.slf4j.Slf4jLogger                                  - Slf4jLogger started
2019-09-06 13:50:34,571 INFO  akka.remote.Remoting                                          - Starting remoting
2019-09-06 13:50:34,726 INFO  akka.remote.Remoting                                          - Remoting started; listening on addresses :[akka.tcp://flink@172.31.50.59:6123]
2019-09-06 13:50:34,733 INFO  org.apache.flink.runtime.rpc.akka.AkkaRpcServiceUtils         - Actor system started at akka.tcp://flink@172.31.50.59:6123
2019-09-06 13:50:34,747 WARN  org.apache.flink.configuration.Configuration                  - Config uses deprecated configuration key 'jobmanager.rpc.address' instead of proper key 'rest.address'
2019-09-06 13:50:34,757 INFO  org.apache.flink.runtime.blob.BlobServer                      - Created BLOB server storage directory /tmp/blobStore-c7a49a00-4241-463b-97d6-f01795c08cde
2019-09-06 13:50:34,760 INFO  org.apache.flink.runtime.blob.BlobServer                      - Started BLOB server at 0.0.0.0:22324 - max concurrent requests: 50 - max backlog: 1000
2019-09-06 13:50:34,774 INFO  org.apache.flink.runtime.metrics.MetricRegistryImpl           - No metrics reporter configured, no metrics will be exposed/reported.
2019-09-06 13:50:34,775 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         - Trying to start actor system at 172.31.50.59:0
2019-09-06 13:50:34,790 INFO  akka.event.slf4j.Slf4jLogger                                  - Slf4jLogger started
2019-09-06 13:50:34,795 INFO  akka.remote.Remoting                                          - Starting remoting
2019-09-06 13:50:34,802 INFO  akka.remote.Remoting                                          - Remoting started; listening on addresses :[akka.tcp://flink-metrics@172.31.50.59:44195]
2019-09-06 13:50:34,803 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         - Actor system started at akka.tcp://flink-metrics@172.31.50.59:44195
2019-09-06 13:50:34,807 INFO  org.apache.flink.runtime.dispatcher.FileArchivedExecutionGraphStore  - Initializing FileArchivedExecutionGraphStore: Storage directory /tmp/executionGraphStore-be620752-bb92-49c0-9556-f93d802f61c2, expiration time 3600000, maximum cache size 52428800 bytes.
2019-09-06 13:50:34,834 INFO  org.apache.flink.runtime.blob.TransientBlobCache              - Created BLOB cache storage directory /tmp/blobStore-ac295e58-8bce-4747-80f5-086a3ddf6874
2019-09-06 13:50:34,850 WARN  org.apache.flink.configuration.Configuration                  - Config uses deprecated configuration key 'jobmanager.rpc.address' instead of proper key 'rest.address'
2019-09-06 13:50:34,851 WARN  org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint    - Upload directory /tmp/flink-web-59e5be3d-7736-4a43-ab10-3c5116bfe201/flink-web-upload does not exist, or has been deleted externally. Previously uploaded files are no longer available.
2019-09-06 13:50:34,852 INFO  org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint    - Created directory /tmp/flink-web-59e5be3d-7736-4a43-ab10-3c5116bfe201/flink-web-upload for file uploads.
2019-09-06 13:50:34,855 INFO  org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint    - Starting rest endpoint.
2019-09-06 13:50:35,063 INFO  org.apache.flink.runtime.webmonitor.WebMonitorUtils           - Determined location of main cluster component log file: /home/apps/jfy/flink-1.7.2/log/flink-apps-standalonesession-6-arch-dev-rmq.log
2019-09-06 13:50:35,063 INFO  org.apache.flink.runtime.webmonitor.WebMonitorUtils           - Determined location of main cluster component stdout file: /home/apps/jfy/flink-1.7.2/log/flink-apps-standalonesession-6-arch-dev-rmq.out
2019-09-06 13:50:35,202 INFO  org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint    - Rest endpoint listening at 172.31.50.59:8081
2019-09-06 13:50:35,202 INFO  org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint    - http://172.31.50.59:8081 was granted leadership with leaderSessionID=00000000-0000-0000-0000-000000000000
2019-09-06 13:50:35,202 INFO  org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint    - Web frontend listening at http://172.31.50.59:8081.
2019-09-06 13:50:35,259 INFO  org.apache.flink.runtime.rpc.akka.AkkaRpcService              - Starting RPC endpoint for org.apache.flink.runtime.resourcemanager.StandaloneResourceManager at akka://flink/user/resourcemanager .
2019-09-06 13:50:35,274 INFO  org.apache.flink.runtime.rpc.akka.AkkaRpcService              - Starting RPC endpoint for org.apache.flink.runtime.dispatcher.StandaloneDispatcher at akka://flink/user/dispatcher .
2019-09-06 13:50:35,288 INFO  org.apache.flink.runtime.resourcemanager.StandaloneResourceManager  - ResourceManager akka.tcp://flink@172.31.50.59:6123/user/resourcemanager was granted leadership with fencing token 00000000000000000000000000000000
2019-09-06 13:50:35,289 INFO  org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager  - Starting the SlotManager.
2019-09-06 13:50:35,302 INFO  org.apache.flink.runtime.dispatcher.StandaloneDispatcher      - Dispatcher akka.tcp://flink@172.31.50.59:6123/user/dispatcher was granted leadership with fencing token 00000000-0000-0000-0000-000000000000
2019-09-06 13:50:35,305 INFO  org.apache.flink.runtime.dispatcher.StandaloneDispatcher      - Recovering all persisted jobs.
2019-09-06 13:50:35,921 INFO  org.apache.flink.runtime.resourcemanager.StandaloneResourceManager  - Registering TaskManager with ResourceID d9ac21b93546848cee400e09e79bf55c (akka.tcp://flink@localhost:32199/user/taskmanager_0) at ResourceManager
2019-09-06 13:50:35,931 INFO  org.apache.flink.runtime.resourcemanager.StandaloneResourceManager  - Registering TaskManager with ResourceID e7f27036fca804c716fd6bada9f1e0d6 (akka.tcp://flink@localhost:28648/user/taskmanager_0) at ResourceManager
  • 写回答

1条回答 默认 最新

  • dabocaiqq 2019-10-03 19:23
    关注
    评论

报告相同问题?

悬赏问题

  • ¥100 有人会搭建GPT-J-6B框架吗?有偿
  • ¥15 求差集那个函数有问题,有无佬可以解决
  • ¥15 【提问】基于Invest的水源涵养
  • ¥20 微信网友居然可以通过vx号找到我绑的手机号
  • ¥15 寻一个支付宝扫码远程授权登录的软件助手app
  • ¥15 解riccati方程组
  • ¥15 display:none;样式在嵌套结构中的已设置了display样式的元素上不起作用?
  • ¥15 使用rabbitMQ 消息队列作为url源进行多线程爬取时,总有几个url没有处理的问题。
  • ¥15 Ubuntu在安装序列比对软件STAR时出现报错如何解决
  • ¥50 树莓派安卓APK系统签名