Uploaded image for project: 'Apache Storm'
  1. Apache Storm
  2. STORM-1732

Resources are deleted when worker dies

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Invalid
    • 1.0.0
    • None
    • storm-core
    • None
    • Windows

    Description

      Lets say a worker has been started by the supervisor

      2016-04-26 16:11:48.716 [o.a.s.d.supervisor] INFO: Launching worker with assignment {:storm-id "Lightning-1-1461683473", :executors [[12 12] [54 54] [42 42] [24 24] [18 18] [6 6] [48 48] [30 30] [36 36]], :resources #object[org.apache.storm.generated.WorkerResources 0x10bac1e4 "WorkerResources(mem_on_heap:0.0, mem_off_heap:0.0, cpu:0.0)"]} for this supervisor 477ae22e-1a2b-4ea3-afd5-cb969f25e732 on port 6700 with id a5d51626-6e9f-4614-9ebb-a6263c140ca2
      2016-04-26 16:11:48.727 [o.a.s.d.supervisor] INFO: Launching worker with command: 'C:\LightningDeployment\Java\bin\java' '-cp' ........
      2016-04-26 16:11:48.910 [o.a.s.config] INFO: SET worker-user a5d51626-6e9f-4614-9ebb-a6263c140ca2 LIGHTNINGVM14$

      note this bit is is new for storm 1.0.0

      2016-04-26 16:11:49.405 [o.a.s.d.supervisor] INFO: Creating symlinks for worker-id: a5d51626-6e9f-4614-9ebb-a6263c140ca2 storm-id: Lightning-1-1461683473 to its port artifacts directory
      2016-04-26 16:11:50.251 [o.a.s.d.supervisor] INFO: Creating symlinks for worker-id: a5d51626-6e9f-4614-9ebb-a6263c140ca2 storm-id: Lightning-1-1461683473 for files(1): ("resources")

      When a worker dies we correctly see some clean up and a new worker started...

      2016-04-26 16:15:35.520 [o.a.s.d.supervisor] INFO: Worker Process a5d51626-6e9f-4614-9ebb-a6263c140ca2 exited with code: 20
      2016-04-26 16:15:39.674 [o.a.s.d.supervisor] INFO: Worker Process a5d51626-6e9f-4614-9ebb-a6263c140ca2 has died!
      2016-04-26 16:15:39.675 [o.a.s.d.supervisor] INFO: Shutting down and clearing state for id a5d51626-6e9f-4614-9ebb-a6263c140ca2. Current supervisor time: 1461683739. State: :timed-out, Heartbeat: {:time-secs 1461683734, :storm-id "Lightning-1-1461683473", :executors [[12 12] [54 54] [42 42] [24 24] [18 18] [6 6] [48 48] [30 30] [-1 -1] [36 36]], :port 6700}
      2016-04-26 16:15:39.676 [o.a.s.d.supervisor] INFO: Shutting down 477ae22e-1a2b-4ea3-afd5-cb969f25e732:a5d51626-6e9f-4614-9ebb-a6263c140ca2
      2016-04-26 16:15:39.676 [o.a.s.config] INFO: GET worker-user a5d51626-6e9f-4614-9ebb-a6263c140ca2
      2016-04-26 16:15:39.677 [o.a.s.d.supervisor] INFO: Worker Process a5d51626-6e9f-4614-9ebb-a6263c140ca2 has died!
      2016-04-26 16:15:39.681 [o.a.s.d.supervisor] INFO: Worker Process a5d51626-6e9f-4614-9ebb-a6263c140ca2 has died!
      2016-04-26 16:15:39.857 [o.a.s.util] INFO: Error when trying to kill 1352. Process is probably already dead.
      2016-04-26 16:15:39.955 [o.a.s.util] INFO: Error when trying to kill 2372. Process is probably already dead.
      2016-04-26 16:15:40.009 [o.a.s.util] INFO: Error when trying to kill 4932. Process is probably already dead.
      2016-04-26 16:15:40.009 [o.a.s.d.supervisor] INFO: Sleep 10 seconds for execution of cleanup threads on worker.
      2016-04-26 16:15:49.677 [o.a.s.d.supervisor] INFO: Worker Process a5d51626-6e9f-4614-9ebb-a6263c140ca2 has died!
      2016-04-26 16:15:49.679 [o.a.s.d.supervisor] INFO: Worker Process a5d51626-6e9f-4614-9ebb-a6263c140ca2 has died!
      2016-04-26 16:15:50.056 [o.a.s.util] INFO: Error when trying to kill 1352. Process is probably already dead.
      2016-04-26 16:15:50.119 [o.a.s.util] INFO: Error when trying to kill 2372. Process is probably already dead.
      2016-04-26 16:15:50.175 [o.a.s.util] INFO: Error when trying to kill 4932. Process is probably already dead.
      2016-04-26 16:15:50.257 [o.a.s.config] INFO: REMOVE worker-user a5d51626-6e9f-4614-9ebb-a6263c140ca2
      2016-04-26 16:15:50.257 [o.a.s.d.supervisor] INFO: Shut down 477ae22e-1a2b-4ea3-afd5-cb969f25e732:a5d51626-6e9f-4614-9ebb-a6263c140ca2
      2016-04-26 16:15:50.257 [o.a.s.d.supervisor] INFO: Launching worker with assignment {:storm-id "Lightning-1-1461683473", :executors [[12 12] [54 54] [42 42] [24 24] [18 18] [6 6] [48 48] [30 30] [36 36]], :resources #object[org.apache.storm.generated.WorkerResources 0x20e1ad4f "WorkerResources(mem_on_heap:0.0, mem_off_heap:0.0, cpu:0.0)"]} for this supervisor 477ae22e-1a2b-4ea3-afd5-cb969f25e732 on port 6700 with id e413447b-c9ca-417d-8e55-e10dd0edc6a4

      When the worker has been cleaned up, it seems the folders that the symlinks are pointing to are also cleaned (this maybe a windows only problem)

      This is bad as it deletes the contents of the "resources" directory and hence any multilang stuff that was in those directories

      also I think STORM-876 introduced this problem

      Attachments

        1. potentalFix.patch
          1 kB
          gareth smith

        Activity

          People

            Unassigned Unassigned
            hairywelly gareth smith
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: