Uploaded image for project: 'Apache Storm'
  1. Apache Storm
  2. STORM-3016

Nimbus gets down when job has large amount of parallelism components

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 2.0.0
    • None
    • storm-core

    Description

      When a job having large amount of parallelism components( total parallelism rises to 5000 for example) been submmited to storm cluster, Nimubs might get crashed, the work flow is as below:

      1)  Nimbus computting assignment

      2) Nimbus sending assignment to zk

      3) When assignment mapping info string is too long due to  total parallelism of job being too large, sending this info to zk will fail (zNode datalength set default is 1M )

      4) Nimbus keeps trying sending this assignment info, after some times, it gives up and crashed, with that happend, the stablity of the cluster will be greatly impacted

      Attachments

        1. nimbus.log
          787 kB
          StaticMian

        Activity

          People

            Unassigned Unassigned
            StaticMian StaticMian
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:

              Time Tracking

                Estimated:
                Original Estimate - 96h
                96h
                Remaining:
                Remaining Estimate - 96h
                96h
                Logged:
                Time Spent - Not Specified
                Not Specified