Uploaded image for project: 'Apache Storm'
  1. Apache Storm
  2. STORM-2682

Supervisor crashes with NullPointerException

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 1.0.4, 1.1.1
    • Fix Version/s: 2.0.0, 1.2.0, 1.1.2, 1.0.5
    • Component/s: None
    • Labels:
      None
    • Environment:
      Dockerized, based on Debian Jessie, running on Ubuntu Trusty, OpenJDK8

      Description

      When supervisor is started, it dies after about 30s like so:

      ...
      2017-08-07 17:12:04.606 o.a.s.d.s.Slot main [WARN] SLOT 192.168.10.21:6701 Starting in state EMPTY - assignment null
      2017-08-07 17:12:04.607 o.a.s.d.s.Slot main [WARN] SLOT 192.168.10.21:6702 Starting in state EMPTY - assignment null
      2017-08-07 17:12:04.607 o.a.s.l.AsyncLocalizer main [INFO] Cleaning up unused topologies in /home/storm/data/supervisor/stormdist
      2017-08-07 17:12:04.617 o.a.s.d.s.Supervisor main [INFO] Starting supervisor with id 65a0f977-474c-4938-a4f5-bc99939e96ff at host 192.168.10.
      21.
      2017-08-07 17:12:04.619 o.a.s.d.m.MetricsUtils main [INFO] Using statistics reporter plugin:org.apache.storm.daemon.metrics.reporters.JmxPrep
      arableReporter
      2017-08-07 17:12:04.620 o.a.s.d.m.r.JmxPreparableReporter main [INFO] Preparing...
      2017-08-07 17:12:04.624 o.a.s.m.StormMetricsRegistry main [INFO] Started statistics report plugin...
      2017-08-07 17:12:34.620 o.a.s.e.EventManagerImp Thread-4 [ERROR] {} Error when processing event
      java.lang.NullPointerException: null
              at java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:936) ~[?:1.8.0_121]
              at org.apache.storm.localizer.Localizer.updateBlobs(Localizer.java:332) ~[storm-core-1.0.4.jar:1.0.4]
              at org.apache.storm.daemon.supervisor.timer.UpdateBlobs.updateBlobsForTopology(UpdateBlobs.java:99) ~[storm-core-1.0.4.jar:1.0.4]
              at org.apache.storm.daemon.supervisor.timer.UpdateBlobs.run(UpdateBlobs.java:72) ~[storm-core-1.0.4.jar:1.0.4]
              at org.apache.storm.event.EventManagerImp$1.run(EventManagerImp.java:54) ~[storm-core-1.0.4.jar:1.0.4]
      2017-08-07 17:12:34.620 o.a.s.u.Utils Thread-4 [ERROR] Halting process: Error when processing an event
      java.lang.RuntimeException: Halting process: Error when processing an event
              at org.apache.storm.utils.Utils.exitProcess(Utils.java:1750) ~[storm-core-1.0.4.jar:1.0.4]
              at org.apache.storm.event.EventManagerImp$1.run(EventManagerImp.java:63) ~[storm-core-1.0.4.jar:1.0.4]
      2017-08-07 17:12:34.631 o.a.s.d.s.Supervisor Thread-5 [INFO] Shutting down supervisor 65a0f977-474c-4938-a4f5-bc99939e96ff
      

        Issue Links

          Activity

          Hide
          revans2 Robert Joseph Evans added a comment -

          Trying to reproduce this...

          Show
          revans2 Robert Joseph Evans added a comment - Trying to reproduce this...
          Hide
          revans2 Robert Joseph Evans added a comment -

          Martin Burian

          Are you running a secure cluster or is it insecure?

          Show
          revans2 Robert Joseph Evans added a comment - Martin Burian Are you running a secure cluster or is it insecure?
          Hide
          buriama8 Martin Burian added a comment -

          I have not set up any security (authentication or anything else), so I would think insecure.

          Show
          buriama8 Martin Burian added a comment - I have not set up any security (authentication or anything else), so I would think insecure.
          Hide
          revans2 Robert Joseph Evans added a comment -

          And you saw this while upgrading from 1.0.3 to 1.0.4?

          Show
          revans2 Robert Joseph Evans added a comment - And you saw this while upgrading from 1.0.3 to 1.0.4?
          Hide
          revans2 Robert Joseph Evans added a comment -

          I was able to reproduce it and it looks like some of the code that we put in to help with a rolling upgrade is either not working properly or is not being executed. I'll do some more debugging and see what I can come up with. Also see if I can come up with a work around for it.

          Show
          revans2 Robert Joseph Evans added a comment - I was able to reproduce it and it looks like some of the code that we put in to help with a rolling upgrade is either not working properly or is not being executed. I'll do some more debugging and see what I can come up with. Also see if I can come up with a work around for it.
          Hide
          revans2 Robert Joseph Evans added a comment -

          I figured out the issue. I'm not sure why my tests didn't catch it to begin with. I'll get a pull request up ASAP.

          Show
          revans2 Robert Joseph Evans added a comment - I figured out the issue. I'm not sure why my tests didn't catch it to begin with. I'll get a pull request up ASAP.
          Hide
          smaldeniya Sahan Maldeniya added a comment -

          I also get this issue. We are using 1.11 and it is not running in secure mode.

          Show
          smaldeniya Sahan Maldeniya added a comment - I also get this issue. We are using 1.11 and it is not running in secure mode.
          Hide
          dan.blanchard Dan Blanchard added a comment -

          We started experiencing this immediately upon upgrading from 1.1.0 to 1.1.1 on one cluster and from 1.0.2 to 1.1.1 on another cluster. Oddly enough, the cluster seems to actually keep processing even with the supervisors dying repeatedly.

          Show
          dan.blanchard Dan Blanchard added a comment - We started experiencing this immediately upon upgrading from 1.1.0 to 1.1.1 on one cluster and from 1.0.2 to 1.1.1 on another cluster. Oddly enough, the cluster seems to actually keep processing even with the supervisors dying repeatedly.
          Hide
          kabhwan Jungtaek Lim added a comment -

          Dan Blanchard
          Could you help us confirming Bobby's patch resolve the issue? Patch is ready https://github.com/apache/storm/pull/2267 for 1.x version line. If you don't want to apply this manually I'll provide patched version of 1.1.2-SNAPSHOT storm-core jar.

          Show
          kabhwan Jungtaek Lim added a comment - Dan Blanchard Could you help us confirming Bobby's patch resolve the issue? Patch is ready https://github.com/apache/storm/pull/2267 for 1.x version line. If you don't want to apply this manually I'll provide patched version of 1.1.2-SNAPSHOT storm-core jar.
          Hide
          dan.blanchard Dan Blanchard added a comment -

          Sure, if you can provide a 1.1.2-SNAPSHOT JAR I can test this out on our beta cluster.

          Show
          dan.blanchard Dan Blanchard added a comment - Sure, if you can provide a 1.1.2-SNAPSHOT JAR I can test this out on our beta cluster.
          Hide
          revans2 Robert Joseph Evans added a comment -

          Dan Blanchard,

          The supervisor only needs to be up to launch and kill workers. This failure only happens about 30 seconds after the supervisor comes up (on a background thread) so the supervisors get 30 seconds of real work done before it crashes. That is enough time to kill/launch a lot of workers.

          Show
          revans2 Robert Joseph Evans added a comment - Dan Blanchard , The supervisor only needs to be up to launch and kill workers. This failure only happens about 30 seconds after the supervisor comes up (on a background thread) so the supervisors get 30 seconds of real work done before it crashes. That is enough time to kill/launch a lot of workers.
          Hide
          kabhwan Jungtaek Lim added a comment -

          Dan Blanchard
          Just attached patched storm-core 1.1.2 SNAPSHOT jar.

          Show
          kabhwan Jungtaek Lim added a comment - Dan Blanchard Just attached patched storm-core 1.1.2 SNAPSHOT jar.
          Hide
          smaldeniya Sahan Maldeniya added a comment -

          Jungtaek Lim I have replaced storm-core-1.1.1.jar with the attach jar and restarted our test cluster. for the last hour it works without supervisor restarts and workers getting killed (issue).

          Show
          smaldeniya Sahan Maldeniya added a comment - Jungtaek Lim I have replaced storm-core-1.1.1.jar with the attach jar and restarted our test cluster. for the last hour it works without supervisor restarts and workers getting killed (issue).
          Hide
          kabhwan Jungtaek Lim added a comment -

          Sahan Maldeniya
          Thanks for reporting!

          Show
          kabhwan Jungtaek Lim added a comment - Sahan Maldeniya Thanks for reporting!
          Hide
          kabhwan Jungtaek Lim added a comment -

          Thanks Robert Joseph Evans for fixing critical issue so quickly. I merged into all 1.x version lines and master.
          Let's discuss another bug fix releases sooner.

          Show
          kabhwan Jungtaek Lim added a comment - Thanks Robert Joseph Evans for fixing critical issue so quickly. I merged into all 1.x version lines and master. Let's discuss another bug fix releases sooner.

            People

            • Assignee:
              revans2 Robert Joseph Evans
              Reporter:
              buriama8 Martin Burian
            • Votes:
              1 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 1h
                1h

                  Development