Uploaded image for project: 'Apache Storm'
  1. Apache Storm
  2. STORM-3784

my supervisor will shut down on 2:00 am everyday

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Cannot Reproduce
    • 2.1.0
    • None
    • storm-server
    • None
    • centos 7 x64

    Description

      The cluster has one nimbus and two supervisors.one of the supervisors is alone with nimbus.

      I deployed two topology that PradarLinkTopology and PradarLogTopology.

      PradarLogTopology run with 4 workers.PradarLinkTopology run with 1 workers.

      on 2:00 am everyday, all supervisors will shut down,i havn't find out the reason.

      I try to clean up the status directory,but the problem still exsit.

      this is my supervisor.log

      //代码占位符
      2021-07-21 02:03:42.070 o.a.s.u.Utils Thread-17 [INFO] Worker Process dcae9231-4be4-4842-9ed0-988e1b8a2b28:Error occurred during initialization of VM2021-07-21 02:03:42.070 o.a.s.u.Utils Thread-17 [INFO] Worker Process dcae9231-4be4-4842-9ed0-988e1b8a2b28:Error occurred during initialization of VM2021-07-21 02:03:42.071 o.a.s.u.Utils Thread-17 [INFO] Worker Process dcae9231-4be4-4842-9ed0-988e1b8a2b28:java.lang.Error: Properties init: Could not determine current working directory.2021-07-21 02:03:42.071 o.a.s.u.Utils Thread-17 [INFO] Worker Process dcae9231-4be4-4842-9ed0-988e1b8a2b28: at java.lang.System.initProperties(Native Method)2021-07-21 02:03:42.071 o.a.s.u.Utils Thread-17 [INFO] Worker Process dcae9231-4be4-4842-9ed0-988e1b8a2b28: at java.lang.System.initializeSystemClass(System.java:1166)2021-07-21 02:03:42.071 o.a.s.u.Utils Thread-17 [INFO] Worker Process dcae9231-4be4-4842-9ed0-988e1b8a2b28:2021-07-21 02:03:42.323 o.a.s.d.s.BasicContainer SLOT_6702 [INFO] Removed Worker ID dcae9231-4be4-4842-9ed0-988e1b8a2b282021-07-21 02:03:42.329 o.a.s.d.s.Slot SLOT_6702 [INFO] STATE kill msInState: 68588 topo:PradarLogTopology-3-1626751922 worker:null -> empty msInState: 32021-07-21 02:03:42.329 o.a.s.d.s.Slot SLOT_6702 [INFO] SLOT 6702: Changing current assignment from LocalAssignment(topology_id:PradarLogTopology-3-1626751922, executors:[ExecutorInfo(task_start:4, task_end:4), ExecutorInfo(task_start:1, task_end:1)], resources:WorkerResources(mem_on_heap:256.0, mem_off_heap:0.0, cpu:20.0, shared_mem_on_heap:0.0, shared_mem_off_heap:0.0, resources:{offheap.memory.mb=0.0, onheap.memory.mb=256.0, cpu.pcore.percent=20.0}, shared_resources:{}), owner:root) to null2021-07-21 02:03:42.353 o.a.s.d.s.Supervisor pool-10-thread-1 [WARN] Topology config is not localized yet...2021-07-21 02:03:42.449 o.a.s.d.s.Slot SLOT_6700 [INFO] SLOT 6700 all processes are dead...2021-07-21 02:03:42.449 o.a.s.d.s.Container SLOT_6700 [INFO] Cleaning up 8cbbfd6c-961b-482d-9175-cf9b79473808-172.26.137.86:b7963273-452a-43af-bc00-d814e0629f962021-07-21 02:03:42.450 o.a.s.d.s.Container SLOT_6700 [INFO] GET worker-user for b7963273-452a-43af-bc00-d814e0629f962021-07-21 02:03:42.450 o.a.s.d.s.AdvancedFSOps SLOT_6700 [INFO] Deleting path /data/apache-storm-2.1.0/status/workers/b7963273-452a-43af-bc00-d814e0629f96/pids/163262021-07-21 02:03:43.322 o.a.s.d.s.AdvancedFSOps SLOT_6701 [INFO] Deleting path /data/apache-storm-2.1.0/status/workers/26b5ffbd-08b6-46df-aa04-6b86f78b8ad8/pids2021-07-21 02:03:43.322 o.a.s.d.s.AdvancedFSOps SLOT_6701 [INFO] Deleting path /data/apache-storm-2.1.0/status/workers/26b5ffbd-08b6-46df-aa04-6b86f78b8ad8/tmp2021-07-21 02:03:45.209 o.a.s.d.s.BasicContainer Thread-17 [INFO] Worker Process dcae9231-4be4-4842-9ed0-988e1b8a2b28 exited with code: 12021-07-21 02:03:45.224 o.a.s.d.s.AdvancedFSOps SLOT_6701 [INFO] Deleting path /data/apache-storm-2.1.0/status/workers/26b5ffbd-08b6-46df-aa04-6b86f78b8ad82021-07-21 02:03:45.224 o.a.s.d.s.Supervisor pool-10-thread-7 [WARN] Topology config is not localized yet...2021-07-21 02:03:45.224 o.a.s.d.s.Container SLOT_6701 [INFO] REMOVE worker-user 26b5ffbd-08b6-46df-aa04-6b86f78b8ad82021-07-21 02:03:45.224 o.a.s.d.s.AdvancedFSOps SLOT_6700 [INFO] Deleting path /data/apache-storm-2.1.0/status/workers/b7963273-452a-43af-bc00-d814e0629f96/heartbeats2021-07-21 02:03:45.224 o.a.s.d.s.AdvancedFSOps SLOT_6701 [INFO] Deleting path /data/apache-storm-2.1.0/status/workers-users/26b5ffbd-08b6-46df-aa04-6b86f78b8ad82021-07-21 02:03:45.224 o.a.s.t.ProcessFunction pool-10-thread-7 [ERROR] Internal error processing sendSupervisorWorkerHeartbeatorg.apache.storm.utils.WrappedNotAliveException: PradarLinkTopology-2-1626337413 does not appear to be alive, you should probably exit at org.apache.storm.daemon.supervisor.Supervisor$1.sendSupervisorWorkerHeartbeat(Supervisor.java:442) ~[storm-server-2.1.0.jar:2.1.0] at org.apache.storm.generated.Supervisor$Processor$sendSupervisorWorkerHeartbeat.getResult(Supervisor.java:374) ~[storm-client-2.1.0.jar:2.1.0] at org.apache.storm.generated.Supervisor$Processor$sendSupervisorWorkerHeartbeat.getResult(Supervisor.java:353) ~[storm-client-2.1.0.jar:2.1.0] at org.apache.storm.thrift.ProcessFunction.process(ProcessFunction.java:38) [storm-shaded-deps-2.1.0.jar:2.1.0] at org.apache.storm.thrift.TBaseProcessor.process(TBaseProcessor.java:39) [storm-shaded-deps-2.1.0.jar:2.1.0] at org.apache.storm.security.auth.SimpleTransportPlugin$SimpleWrapProcessor.process(SimpleTransportPlugin.java:174) [storm-client-2.1.0.jar:2.1.0] at org.apache.storm.thrift.server.AbstractNonblockingServer$FrameBuffer.invoke(AbstractNonblockingServer.java:518) [storm-shaded-deps-2.1.0.jar:2.1.0] at org.apache.storm.thrift.server.Invocation.run(Invocation.java:18) [storm-shaded-deps-2.1.0.jar:2.1.0] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_201] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_201] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_201]2021-07-21 02:03:45.225 o.a.s.d.s.AdvancedFSOps SLOT_6700 [INFO] Deleting path /data/apache-storm-2.1.0/status/workers/b7963273-452a-43af-bc00-d814e0629f96/pids2021-07-21 02:03:45.225 o.a.s.d.s.BasicContainer Thread-16 [INFO] Worker Process b7963273-452a-43af-bc00-d814e0629f96 exited with code: 2542021-07-21 02:03:45.225 o.a.s.d.s.AdvancedFSOps SLOT_6700 [INFO] Deleting path /data/apache-storm-2.1.0/status/workers/b7963273-452a-43af-bc00-d814e0629f96/tmp2021-07-21 02:03:45.226 o.a.s.d.s.AdvancedFSOps SLOT_6700 [INFO] Deleting path /data/apache-storm-2.1.0/status/workers/b7963273-452a-43af-bc00-d814e0629f962021-07-21 02:03:45.226 o.a.s.d.s.Container SLOT_6700 [INFO] REMOVE worker-user b7963273-452a-43af-bc00-d814e0629f962021-07-21 02:03:45.226 o.a.s.d.s.AdvancedFSOps SLOT_6700 [INFO] Deleting path /data/apache-storm-2.1.0/status/workers-users/b7963273-452a-43af-bc00-d814e0629f962021-07-21 02:03:45.227 o.a.s.d.s.BasicContainer SLOT_6701 [INFO] Removed Worker ID 26b5ffbd-08b6-46df-aa04-6b86f78b8ad82021-07-21 02:03:45.228 o.a.s.d.s.BasicContainer SLOT_6700 [INFO] Removed Worker ID b7963273-452a-43af-bc00-d814e0629f962021-07-21 02:03:45.229 o.a.s.d.s.Slot SLOT_6700 [INFO] STATE kill msInState: 81385 topo:PradarLogTopology-3-1626751922 worker:null -> empty msInState: 02021-07-21 02:03:45.229 o.a.s.d.s.Slot SLOT_6700 [INFO] SLOT 6700: Changing current assignment from LocalAssignment(topology_id:PradarLogTopology-3-1626751922, executors:[ExecutorInfo(task_start:3, task_end:3)], resources:WorkerResources(mem_on_heap:128.0, mem_off_heap:0.0, cpu:10.0, shared_mem_on_heap:0.0, shared_mem_off_heap:0.0, resources:{offheap.memory.mb=0.0, onheap.memory.mb=128.0, cpu.pcore.percent=10.0}, shared_resources:{}), owner:root) to null2021-07-21 02:03:45.230 o.a.s.d.s.Slot SLOT_6701 [INFO] STATE kill-and-relaunch msInState: 95356 topo:PradarLogTopology-3-1626751922 worker:null -> waiting-for-blob-localization msInState: 12021-07-21 02:03:45.231 o.a.s.d.s.Slot SLOT_6701 [INFO] SLOT 6701: Changing current assignment from LocalAssignment(topology_id:PradarLogTopology-3-1626751922, executors:[ExecutorInfo(task_start:3, task_end:3)], resources:WorkerResources(mem_on_heap:128.0, mem_off_heap:0.0, cpu:10.0, shared_mem_on_heap:0.0, shared_mem_off_heap:0.0, resources:{offheap.memory.mb=0.0, onheap.memory.mb=128.0, cpu.pcore.percent=10.0}, shared_resources:{}), owner:root) to null2021-07-21 02:03:45.231 o.a.s.d.s.Slot SLOT_6700 [INFO] STATE empty msInState: 2 -> waiting-for-blob-localization msInState: 02021-07-21 02:03:45.232 o.a.s.d.s.Slot SLOT_6701 [ERROR] Error when processing eventjava.io.FileNotFoundException: File '/data/apache-storm-2.1.0/status/supervisor/stormdist/PradarLinkTopology-4-1626751925/stormconf.ser' does not exist at org.apache.storm.shade.org.apache.commons.io.FileUtils.openInputStream(FileUtils.java:297) ~[storm-shaded-deps-2.1.0.jar:2.1.0] at org.apache.storm.shade.org.apache.commons.io.FileUtils.readFileToByteArray(FileUtils.java:1851) ~[storm-shaded-deps-2.1.0.jar:2.1.0] at org.apache.storm.utils.ConfigUtils.readSupervisorStormConfGivenPath(ConfigUtils.java:303) ~[storm-client-2.1.0.jar:2.1.0] at org.apache.storm.utils.ConfigUtils.readSupervisorStormConfImpl(ConfigUtils.java:464) ~[storm-client-2.1.0.jar:2.1.0] at org.apache.storm.utils.ConfigUtils.readSupervisorStormConf(ConfigUtils.java:298) ~[storm-client-2.1.0.jar:2.1.0] at org.apache.storm.localizer.AsyncLocalizer.getLocalResources(AsyncLocalizer.java:351) ~[storm-server-2.1.0.jar:2.1.0] at org.apache.storm.localizer.AsyncLocalizer.releaseSlotFor(AsyncLocalizer.java:452) ~[storm-server-2.1.0.jar:2.1.0] at org.apache.storm.daemon.supervisor.Slot.handleWaitingForBlobLocalization(Slot.java:440) ~[storm-server-2.1.0.jar:2.1.0] at org.apache.storm.daemon.supervisor.Slot.stateMachineStep(Slot.java:228) ~[storm-server-2.1.0.jar:2.1.0] at org.apache.storm.daemon.supervisor.Slot.run(Slot.java:931) [storm-server-2.1.0.jar:2.1.0]2021-07-21 02:03:45.234 o.a.s.u.Utils SLOT_6701 [ERROR] Halting process: Error when processing an eventjava.lang.RuntimeException: Halting process: Error when processing an event at org.apache.storm.utils.Utils.exitProcess(Utils.java:512) [storm-client-2.1.0.jar:2.1.0] at org.apache.storm.daemon.supervisor.Slot.run(Slot.java:978) [storm-server-2.1.0.jar:2.1.0]2021-07-21 02:03:45.235 o.a.s.d.s.BasicContainer SLOT_6700 [INFO] Created Worker ID 68102ac7-a341-4d84-b1aa-db0f72934f992021-07-21 02:03:45.236 o.a.s.d.s.Container SLOT_6700 [INFO] Setting up 8cbbfd6c-961b-482d-9175-cf9b79473808-172.26.137.86:68102ac7-a341-4d84-b1aa-db0f72934f992021-07-21 02:03:45.236 o.a.s.d.s.Container SLOT_6700 [INFO] GET worker-user for 68102ac7-a341-4d84-b1aa-db0f72934f992021-07-21 02:03:45.240 o.a.s.d.s.Container SLOT_6700 [INFO] SET worker-user 68102ac7-a341-4d84-b1aa-db0f72934f99 root2021-07-21 02:03:45.241 o.a.s.d.s.Container SLOT_6700 [INFO] Creating symlinks for worker-id: 68102ac7-a341-4d84-b1aa-db0f72934f99 storm-id: PradarLogTopology-3-1626751922 for files(1): [resources]2021-07-21 02:03:45.241 o.a.s.d.s.BasicContainer SLOT_6700 [INFO] Launching worker with assignment LocalAssignment(topology_id:PradarLogTopology-3-1626751922, executors:[ExecutorInfo(task_start:4, task_end:4), ExecutorInfo(task_start:1, task_end:1)], resources:WorkerResources(mem_on_heap:256.0, mem_off_heap:0.0, cpu:20.0, shared_mem_on_heap:0.0, shared_mem_off_heap:0.0, resources:{offheap.memory.mb=0.0, onheap.memory.mb=256.0, cpu.pcore.percent=20.0}, shared_resources:{}), owner:root) for this supervisor 8cbbfd6c-961b-482d-9175-cf9b79473808-172.26.137.86 on port 6700 with id 68102ac7-a341-4d84-b1aa-db0f72934f992021-07-21 02:03:45.243 o.a.s.d.s.BasicContainer SLOT_6700 [INFO] Launching worker with command: '/usr/local/java/bin/java' '-cp' '/data/apache-storm-2.1.0/lib-worker/*:/data/apache-storm-2.1.0/extlib/*:/data/apache-storm-2.1.0/conf:/data/apache-storm-2.1.0/status/supervisor/stormdist/PradarLogTopology-3-1626751922/stormjar.jar' '-Xmx64m' '-Dlogging.sensitivity=S3' '-Dlogfile.name=worker.log' '-Dstorm.home=/data/apache-storm-2.1.0' '-Dworkers.artifacts=/data/apache-storm-2.1.0/logs/workers-artifacts' '-Dstorm.id=PradarLogTopology-3-1626751922' '-Dworker.id=68102ac7-a341-4d84-b1aa-db0f72934f99' '-Dworker.port=6700' '-Dstorm.log.dir=/data/apache-storm-2.1.0/logs' '-DLog4jContextSelector=org.apache.logging.log4j.core.selector.BasicContextSelector' '-Dstorm.local.dir=/data/apache-storm-2.1.0/status' '-Dworker.memory_limit_mb=256' '-Dlog4j.configurationFile=/data/apache-storm-2.1.0/log4j2/worker.xml' 'org.apache.storm.LogWriter' '/usr/local/java/bin/java' '-server' '-Dlogging.sensitivity=S3' '-Dlogfile.name=worker.log' '-Dstorm.home=/data/apache-storm-2.1.0' '-Dworkers.artifacts=/data/apache-storm-2.1.0/logs/workers-artifacts' '-Dstorm.id=PradarLogTopology-3-1626751922' '-Dworker.id=68102ac7-a341-4d84-b1aa-db0f72934f99' '-Dworker.port=6700' '-Dstorm.log.dir=/data/apache-storm-2.1.0/logs' '-DLog4jContextSelector=org.apache.logging.log4j.core.selector.BasicContextSelector' '-Dstorm.local.dir=/data/apache-storm-2.1.0/status' '-Dworker.memory_limit_mb=256' '-Dlog4j.configurationFile=/data/apache-storm-2.1.0/log4j2/worker.xml' '-Xmx256m' '-XX:+PrintGCDetails' '-Xloggc:artifacts/gc.log' '-XX:+PrintGCDateStamps' '-XX:+PrintGCTimeStamps' '-XX:+UseGCLogFileRotation' '-XX:NumberOfGCLogFiles=10' '-XX:GCLogFileSize=1M' '-XX:+HeapDumpOnOutOfMemoryError' '-XX:HeapDumpPath=artifacts/heapdump' '-Xms2g' '-Xmx2g' '-XX:MaxDirectMemorySize=512m' '-XX:+HeapDumpOnOutOfMemoryError' '-XX:HeapDumpPath=java.hprof' '-XX:MetaspaceSize=256m' '-XX:MaxMetaspaceSize=256m' '-XX:-OmitStackTraceInFastThrow' '-Djava.library.path=/data/apache-storm-2.1.0/status/supervisor/stormdist/PradarLogTopology-3-1626751922/resources/Linux-amd64:/data/apache-storm-2.1.0/status/supervisor/stormdist/PradarLogTopology-3-1626751922/resources:/usr/local/lib:/opt/local/lib:/usr/lib:/usr/lib64' '-Dstorm.conf.file=' '-Dstorm.options=' '-Djava.io.tmpdir=/data/apache-storm-2.1.0/status/workers/68102ac7-a341-4d84-b1aa-db0f72934f99/tmp' '-cp' '/data/apache-storm-2.1.0/lib-worker/*:/data/apache-storm-2.1.0/extlib/*:/data/apache-storm-2.1.0/conf:/data/apache-storm-2.1.0/status/supervisor/stormdist/PradarLogTopology-3-1626751922/stormjar.jar' 'org.apache.storm.daemon.worker.Worker' 'PradarLogTopology-3-1626751922' '8cbbfd6c-961b-482d-9175-cf9b79473808-172.26.137.86' '6628' '6700' '68102ac7-a341-4d84-b1aa-db0f72934f99'. 2021-07-21 02:03:45.243 o.a.s.u.Utils Thread-5 [INFO] Halting after 1 seconds2021-07-21 02:03:45.244 o.a.s.d.s.Supervisor Thread-6 [INFO] Shutting down supervisor 8cbbfd6c-961b-482d-9175-cf9b79473808-172.26.137.86
      

      Attachments

        1. supervisor(1).log
          611 kB
          Sunsy Sun

        Activity

          People

            Unassigned Unassigned
            Sunsy Sunsy Sun
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: