Uploaded image for project: 'Apache Storm'
  1. Apache Storm
  2. STORM-3501

Local Cluster worker restarts

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      I was trying to launch a topology that I'm developing (in 2.0.0) and noticed that the worker was getting restarted each ~30 seconds. 
      I placed a breakpoint in the kill method of LocalContainer (https://github.com/apache/storm/blob/2ba95bbd1c911d4fc6363b1c4b9c4c6d86ac9aae/storm-server/src/main/java/org/apache/storm/daemon/supervisor/LocalContainer.java#L66) to try and understand why the worker was getting restarted. 
       
      The call stack was:
      _kill:66, LocalContainer (org.apache.storm.daemon.supervisor)
      killContainerFor:269, Slot (org.apache.storm.daemon.supervisor)
      handleRunning:724, Slot (org.apache.storm.daemon.supervisor)
      stateMachineStep:218, Slot (org.apache.storm.daemon.supervisor)
      run:931, Slot (org.apache.storm.daemon.supervisor) _
       
      With this I can understand that the worker is killed because a blob has changed (https://github.com/apache/storm/blob/2ba95bbd1c911d4fc6363b1c4b9c4c6d86ac9aae/storm-server/src/main/java/org/apache/storm/daemon/supervisor/Slot.java#L724). In fact, there's a changing blob in the dynamicState at that point.
       
      I checked the AsyncLocalizer which downloads, caches blobs locally, and notifies the Slot state machine of a changing blob.
       
      I noticed this:

      Which tell me that (correct me if I'm wrong):

      • Supervisor tries to update blobs each 30 seconds.
      • The topology jar blob requires extraction of the resources directory (either from a jar or directly in a classpath URL). It does so in fetchUnzipToTemp and it's existence is checked in isFullyDownloaded.
      • The Slot is notified of a changing blob if:
      • the remote version is different from the local version (the code has changed).
      • OR the blob is not fully downloaded (the jar exists, and the extracted resources directory exists).

       
      Well, I did not have a resources folder under the root of the classpath, and that's why the worker was being restarted each ~30 seconds, as the Slot was being notified of a changing blob everytime updateBlobs ran. 
      I created a resources folder (with dummy files) under the root of the classpath and the problem is now solved.
       
      However, if I understand correctly, the resources folder is only required for multilang. Our topologies do not use multilang and this do not happen in Storm 1.1.3 for instance.

       

      Happy to submit MR.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            diogopmonteiro Diogo Monteiro
            diogopmonteiro Diogo Monteiro
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 40m
                40m

                Slack

                  Issue deployment