1、问题现象
海豚的版本是3.1.7
报的错误信息如下
Fault tolerance warning
[{"type":"WORKER","host":"/nodes/worker/dolphinscheduler-worker-1.dolphinscheduler-worker-headless:1234","event":"SERVER_DOWN","warningLevel":"SERIOUS"}] 。
2、查看日志
查看日志时,发现报错的那个时间点都有以下相关日志,
关键字current cpu load average x is too high or available memory x is too low
[WARN] 2024-11-06 16:50:59.108 +0800 org.apache.dolphinscheduler.server.worker.task.WorkerHeartBeatTask:[101] - [WorkflowInstance-0][TaskInstance-0] - current cpu load average 236.01 is too high or available memory 14.36G is too low, under max.cpuload.avg=160.0 and reserved.memory=0.3G
[INFO] 2024-11-06 16:50:59.109 +0800 org.apache.dolphinscheduler.server.worker.task.WorkerHeartBeatTask:[89] - [WorkflowInstance-0][TaskInstance-0] - Success write worker group heartBeatInfo into registry, workerRegistryPath: /nodes/worker/dolphinscheduler-worker-0.dolphinscheduler-worker-headless:1234 workerHeartBeatInfo: {"startupTime":1730882939081,"reportTime":1730883059108,"cpuUsage":0.39,"memoryUsage":0.94,"loadAverage":236.01,"availablePhysicalMemorySize":14.36,"maxCpuloadAvg":160.0,"reservedMemory":0.3,"diskAvailable":283.33,"serverStatus":1,"processId":8,"workerHostWeight":100,"workerWaitingTaskCount":0,"workerExecThreadCount":100}
[WARN] 2024-11-06 16:51:09.111 +0800 org.apache.dolphinscheduler.server.worker.task.WorkerHeartBeatTask:[101] - [WorkflowInstance-0][TaskInstance-0] - current cpu load average 204.41 is too high or available memory 14.78G is too low, under max.cpuload.avg=160.0 and reserved.memory=0.3G
3、请问如何解决以上问题?