flink搭建standalone模式集群,启动后任务提交跟运行正常,gc情况观察了一下也正常,但是jobmanager到晚上会自动挂掉,而且一直刷的warn日志。
flink版本:1.7.2 三台机器,web界面信息正常。
问题:jobmanager会挂掉,跟这个日志是否有关呢?我希望集群可以稳定跑下去,目前任务只是对接kafka与redis。
warn日志如下:
09-06 14:00:23,430 WARN akka.remote.transport.netty.NettyTransport - Remote connection to [null] failed with java.net.ConnectException: Connection refused: localhost/127.0.0.1:63408
2019-09-06 14:00:23,431 WARN akka.remote.ReliableDeliverySupervisor - Association with remote system [akka.tcp://flink-metrics@localhost:63408] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink-metrics@localhost:63408]] Caused by: [Connection refused: localhost/127.0.0.1:63408]
2019-09-06 14:00:23,431 WARN akka.remote.transport.netty.NettyTransport - Remote connection to [null] failed with java.net.ConnectException: Connection refused: localhost/127.0.0.1:30060
2019-09-06 14:00:23,431 WARN akka.remote.ReliableDeliverySupervisor - Association with remote system [akka.tcp://flink-metrics@localhost:30060] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink-metrics@localhost:30060]] Caused by: [Connection refused: localhost/127.0.0.1:30060]
集群启动日志如下:
2019-09-06 13:50:33,581 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - --------------------------------------------------------------------------------
2019-09-06 13:50:33,582 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Starting StandaloneSessionClusterEntrypoint (Version: 1.7.2, Rev:ceba8af, Date:11.02.2019 @ 14:17:09 UTC)
2019-09-06 13:50:33,582 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - OS current user: apps
2019-09-06 13:50:33,816 WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2019-09-06 13:50:33,945 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Current Hadoop/Kerberos user: apps
2019-09-06 13:50:33,945 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - JVM: Java HotSpot(TM) 64-Bit Server VM - Oracle Corporation - 1.8/25.161-b12
2019-09-06 13:50:33,945 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Maximum heap size: 981 MiBytes
2019-09-06 13:50:33,945 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - JAVA_HOME: /apps/svr/jdk1.8.0_161
2019-09-06 13:50:33,947 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Hadoop version: 2.6.5
2019-09-06 13:50:33,947 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - JVM Options:
2019-09-06 13:50:33,948 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - -Xms1024m
2019-09-06 13:50:33,948 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - -Xmx1024m
2019-09-06 13:50:33,948 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - -Dlog.file=/home/apps/jfy/flink-1.7.2/log/flink-apps-standalonesession-6-arch-dev-rmq.log
2019-09-06 13:50:33,948 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - -Dlog4j.configuration=file:/home/apps/jfy/flink-1.7.2/conf/log4j.properties
2019-09-06 13:50:33,948 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - -Dlogback.configurationFile=file:/home/apps/jfy/flink-1.7.2/conf/logback.xml
2019-09-06 13:50:33,948 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Program Arguments:
2019-09-06 13:50:33,948 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - --configDir
2019-09-06 13:50:33,948 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - /home/apps/jfy/flink-1.7.2/conf
2019-09-06 13:50:33,948 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - --executionMode
2019-09-06 13:50:33,948 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - cluster
2019-09-06 13:50:33,948 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Classpath: /home/apps/jfy/flink-1.7.2/lib/flink-python_2.11-1.7.2.jar:/home/apps/jfy/flink-1.7.2/lib/flink-shaded-hadoop2-uber-1.7.2.jar:/home/apps/jfy/flink-1.7.2/lib/log4j-1.2.17.jar:/home/apps/jfy/flink-1.7.2/lib/slf4j-log4j12-1.7.15.jar:/home/apps/jfy/flink-1.7.2/lib/flink-dist_2.11-1.7.2.jar:::
2019-09-06 13:50:33,948 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - --------------------------------------------------------------------------------
2019-09-06 13:50:33,949 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Registered UNIX signal handlers for [TERM, HUP, INT]
2019-09-06 13:50:33,959 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.address, 172.31.50.59
2019-09-06 13:50:33,960 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.port, 6123
2019-09-06 13:50:33,960 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.heap.size, 1024m
2019-09-06 13:50:33,960 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.heap.size, 1024m
2019-09-06 13:50:33,960 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.numberOfTaskSlots, 1
2019-09-06 13:50:33,960 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: parallelism.default, 1
2019-09-06 13:50:33,960 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: rest.port, 8081
2019-09-06 13:50:33,973 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Starting StandaloneSessionClusterEntrypoint.
2019-09-06 13:50:33,973 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Install default filesystem.
2019-09-06 13:50:33,983 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Install security context.
2019-09-06 13:50:34,016 INFO org.apache.flink.runtime.security.modules.HadoopModule - Hadoop user set to apps (auth:SIMPLE)
2019-09-06 13:50:34,030 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Initializing cluster services.
2019-09-06 13:50:34,191 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcServiceUtils - Trying to start actor system at 172.31.50.59:6123
2019-09-06 13:50:34,520 INFO akka.event.slf4j.Slf4jLogger - Slf4jLogger started
2019-09-06 13:50:34,571 INFO akka.remote.Remoting - Starting remoting
2019-09-06 13:50:34,726 INFO akka.remote.Remoting - Remoting started; listening on addresses :[akka.tcp://flink@172.31.50.59:6123]
2019-09-06 13:50:34,733 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcServiceUtils - Actor system started at akka.tcp://flink@172.31.50.59:6123
2019-09-06 13:50:34,747 WARN org.apache.flink.configuration.Configuration - Config uses deprecated configuration key 'jobmanager.rpc.address' instead of proper key 'rest.address'
2019-09-06 13:50:34,757 INFO org.apache.flink.runtime.blob.BlobServer - Created BLOB server storage directory /tmp/blobStore-c7a49a00-4241-463b-97d6-f01795c08cde
2019-09-06 13:50:34,760 INFO org.apache.flink.runtime.blob.BlobServer - Started BLOB server at 0.0.0.0:22324 - max concurrent requests: 50 - max backlog: 1000
2019-09-06 13:50:34,774 INFO org.apache.flink.runtime.metrics.MetricRegistryImpl - No metrics reporter configured, no metrics will be exposed/reported.
2019-09-06 13:50:34,775 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Trying to start actor system at 172.31.50.59:0
2019-09-06 13:50:34,790 INFO akka.event.slf4j.Slf4jLogger - Slf4jLogger started
2019-09-06 13:50:34,795 INFO akka.remote.Remoting - Starting remoting
2019-09-06 13:50:34,802 INFO akka.remote.Remoting - Remoting started; listening on addresses :[akka.tcp://flink-metrics@172.31.50.59:44195]
2019-09-06 13:50:34,803 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Actor system started at akka.tcp://flink-metrics@172.31.50.59:44195
2019-09-06 13:50:34,807 INFO org.apache.flink.runtime.dispatcher.FileArchivedExecutionGraphStore - Initializing FileArchivedExecutionGraphStore: Storage directory /tmp/executionGraphStore-be620752-bb92-49c0-9556-f93d802f61c2, expiration time 3600000, maximum cache size 52428800 bytes.
2019-09-06 13:50:34,834 INFO org.apache.flink.runtime.blob.TransientBlobCache - Created BLOB cache storage directory /tmp/blobStore-ac295e58-8bce-4747-80f5-086a3ddf6874
2019-09-06 13:50:34,850 WARN org.apache.flink.configuration.Configuration - Config uses deprecated configuration key 'jobmanager.rpc.address' instead of proper key 'rest.address'
2019-09-06 13:50:34,851 WARN org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint - Upload directory /tmp/flink-web-59e5be3d-7736-4a43-ab10-3c5116bfe201/flink-web-upload does not exist, or has been deleted externally. Previously uploaded files are no longer available.
2019-09-06 13:50:34,852 INFO org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint - Created directory /tmp/flink-web-59e5be3d-7736-4a43-ab10-3c5116bfe201/flink-web-upload for file uploads.
2019-09-06 13:50:34,855 INFO org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint - Starting rest endpoint.
2019-09-06 13:50:35,063 INFO org.apache.flink.runtime.webmonitor.WebMonitorUtils - Determined location of main cluster component log file: /home/apps/jfy/flink-1.7.2/log/flink-apps-standalonesession-6-arch-dev-rmq.log
2019-09-06 13:50:35,063 INFO org.apache.flink.runtime.webmonitor.WebMonitorUtils - Determined location of main cluster component stdout file: /home/apps/jfy/flink-1.7.2/log/flink-apps-standalonesession-6-arch-dev-rmq.out
2019-09-06 13:50:35,202 INFO org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint - Rest endpoint listening at 172.31.50.59:8081
2019-09-06 13:50:35,202 INFO org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint - http://172.31.50.59:8081 was granted leadership with leaderSessionID=00000000-0000-0000-0000-000000000000
2019-09-06 13:50:35,202 INFO org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint - Web frontend listening at http://172.31.50.59:8081.
2019-09-06 13:50:35,259 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService - Starting RPC endpoint for org.apache.flink.runtime.resourcemanager.StandaloneResourceManager at akka://flink/user/resourcemanager .
2019-09-06 13:50:35,274 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService - Starting RPC endpoint for org.apache.flink.runtime.dispatcher.StandaloneDispatcher at akka://flink/user/dispatcher .
2019-09-06 13:50:35,288 INFO org.apache.flink.runtime.resourcemanager.StandaloneResourceManager - ResourceManager akka.tcp://flink@172.31.50.59:6123/user/resourcemanager was granted leadership with fencing token 00000000000000000000000000000000
2019-09-06 13:50:35,289 INFO org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - Starting the SlotManager.
2019-09-06 13:50:35,302 INFO org.apache.flink.runtime.dispatcher.StandaloneDispatcher - Dispatcher akka.tcp://flink@172.31.50.59:6123/user/dispatcher was granted leadership with fencing token 00000000-0000-0000-0000-000000000000
2019-09-06 13:50:35,305 INFO org.apache.flink.runtime.dispatcher.StandaloneDispatcher - Recovering all persisted jobs.
2019-09-06 13:50:35,921 INFO org.apache.flink.runtime.resourcemanager.StandaloneResourceManager - Registering TaskManager with ResourceID d9ac21b93546848cee400e09e79bf55c (akka.tcp://flink@localhost:32199/user/taskmanager_0) at ResourceManager
2019-09-06 13:50:35,931 INFO org.apache.flink.runtime.resourcemanager.StandaloneResourceManager - Registering TaskManager with ResourceID e7f27036fca804c716fd6bada9f1e0d6 (akka.tcp://flink@localhost:28648/user/taskmanager_0) at ResourceManager