Uploaded image for project: 'Apache Storm'
  1. Apache Storm
  2. STORM-167

proposal for storm topology online update

    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: storm-core
    • Labels:
      None

      Description

      https://github.com/nathanmarz/storm/issues/540

      Now update topology code can only be done by kill it and re-submit a new one. During the kill and re-submit process some request may delay or fail. It is not so good for online service. So we consider to add topology online update recently.

      Mission

      update running topology code gracefully one worker after another without service total interrupted. Just update topology code, not update topology DAG structure including component, stream and task number.

      Proposal

      • client use "storm update topology-name new-jar-file" to submit new-jar-file update request
      • nimbus update stormdist dir, link topology-dir to new one
      • nimbus update topology version on zk
      • the supervisors that running this topology update it
        • check topology version on zk, if it is not the same as local version, a topology update begin
        • each supervisor schedule the topology's worker update at a rand(expect-max-update-time) time point
        • sync-supervisor download the latest code from nimbus
        • sync-process check local worker heartbeat version(to be added), if it is not the same with sync-supervisor downloaded version, kill the worker
        • sync-process restart killed worker
        • new worker heartbeat to zk with version(to be added), it can be displayed on web ui to check update progress.

      This feature is deployed in our production clusters. It's really useful for topologys handling online request waiting for response. Topology jar can be updated without entire service offline.

      We hope that this feature is useful for others too.

        Attachments

          Activity

            People

            • Assignee:
              parth.brahmbhatt Parth Brahmbhatt
              Reporter:
              xumingming James Xu
            • Votes:
              9 Vote for this issue
              Watchers:
              24 Start watching this issue

              Dates

              • Created:
                Updated:

                Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 1h 40m
                1h 40m