Uploaded image for project: 'Apache Storm'
  1. Apache Storm
  2. STORM-3687

Add a warning about possible issues on a mixed cluster if the StormCommon.systemTopology implementation is changed

    XMLWordPrintableJSON

Details

    • Comment
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • None
    • 2.3.0
    • None
    • None

    Description

      During rolling upgrade, some supervisors are still at 2.2, and others are running on 2.3. So some workers are running with storm-client-2.2.jar and others are on 2.3.

      Because of https://github.com/apache/storm/commit/93a7f770d508668bc7af183e08535813fff6f805 (STORM-3660)

      the "_credentials" stream is removed.
      So the map of streamId to streamName for system component is changed
      refer to the code at:
      https://github.com/apache/storm/blob/v2.2.0/storm-client/src/jvm/org/apache/storm/serialization/SerializationFactory.java#L218-L222

      So a worker running on storm-client-2.2 might send out _metrics from its systembolt, while the MetricsConsumer running with storm-client-2.3 interprets it as "_metrics_tick" because the mapping is different:

      o.a.s.s.SerializationFactory Thread-15-__system-executor[-1, -1] [INFO] idmap for system comp {__tick=6, __system=5, __metrics_tick=4, __credentials=1, __flush=2, __metrics=3}
      

      vs

      o.a.s.s.SerializationFactory Netty-server-localhost-6703-worker-1 [INFO] idmap for system comp {__tick=5, __system=4, __metrics_tick=3, __flush=1, __metrics=2}
      

      Hence we see

      Caused by: java.lang.RuntimeException: java.lang.ClassCastException: org.apache.storm.metric.api.IMetricsConsumer$TaskInfo cannot be cast to java.lang.Integer
              at org.apache.storm.executor.Executor.accept(Executor.java:293) ~[storm-client-2.3.0.y.jar:2.3.0.y]
              at org.apache.storm.utils.JCQueue.consumeImpl(JCQueue.java:113) ~[storm-client-2.3.0.y.jar:2.3.0.y]
              at org.apache.storm.utils.JCQueue.consume(JCQueue.java:89) ~[storm-client-2.3.0.y.jar:2.3.0.y]
              at org.apache.storm.executor.bolt.BoltExecutor$1.call(BoltExecutor.java:167) ~[storm-client-2.3.0.y.jar:2.3.0.y]
              at org.apache.storm.executor.bolt.BoltExecutor$1.call(BoltExecutor.java:153) ~[storm-client-2.3.0.y.jar:2.3.0.y]
              at org.apache.storm.utils.Utils$1.run(Utils.java:398) ~[storm-client-2.3.0.y.jar:2.3.0.y]
              ... 1 more
      Caused by: java.lang.ClassCastException: org.apache.storm.metric.api.IMetricsConsumer$TaskInfo cannot be cast to java.lang.Integer
              at org.apache.storm.tuple.TupleImpl.getInteger(TupleImpl.java:121) ~[storm-client-2.3.0.y.jar:2.3.0.y]
              at org.apache.storm.executor.Executor.metricsTick(Executor.java:320) ~[storm-client-2.3.0.y.jar:2.3.0.y]
              at org.apache.storm.executor.bolt.BoltExecutor.tupleActionFn(BoltExecutor.java:213) ~[storm-client-2.3.0.y.jar:2.3.0.y]
              at org.apache.storm.executor.Executor.accept(Executor.java:286) ~[storm-client-2.3.0.y.jar:2.3.0.y]
              at org.apache.storm.utils.JCQueue.consumeImpl(JCQueue.java:113) ~[storm-client-2.3.0.y.jar:2.3.0.y]
              at org.apache.storm.utils.JCQueue.consume(JCQueue.java:89) ~[storm-client-2.3.0.y.jar:2.3.0.y]
              at org.apache.storm.executor.bolt.BoltExecutor$1.call(BoltExecutor.java:167) ~[storm-client-2.3.0.y.jar:2.3.0.y]
              at org.apache.storm.executor.bolt.BoltExecutor$1.call(BoltExecutor.java:153) ~[storm-client-2.3.0.y.jar:2.3.0.y]
              at org.apache.storm.utils.Utils$1.run(Utils.java:398) ~[storm-client-2.3.0.y.jar:2.3.0.y]
              ... 1 more
      2020-08-04 18:26:08.017 o.a.s.m.c
      
      
      
      2020-08-04 18:26:08.027 o.a.s.u.Utils Thread-19-__metrics_org.apache.storm.metric.LoggingMetricsConsumer-executor[1555, 1555] [ERROR] Halting process: Worker died
      java.lang.RuntimeException: Halting process: Worker died
              at org.apache.storm.utils.Utils.exitProcess(Utils.java:518) [storm-client-2.3.0.y.jar:2.3.0.y]
              at org.apache.storm.utils.Utils$3.run(Utils.java:870) [storm-client-2.3.0.y.jar:2.3.0.y]
              at org.apache.storm.executor.error.ReportErrorAndDie.uncaughtException(ReportErrorAndDie.java:41) [storm-client-2.3.0.y.jar:2.3.0.y]
              at java.lang.Thread.dispatchUncaughtException(Thread.java:1959) [?:1.8.0_242]
      2020-08-04 18:26:08.027 o.a.s.u.Utils Thread-18-__metrics_org.apache.storm.metric.LoggingMetricsConsumer-executor[1553, 1553] [ERROR] Halting process: Worker died
      java.lang.RuntimeException: Halting process: Worker died
              at org.apache.storm.utils.Utils.exitProcess(Utils.java:518) [storm-client-2.3.0.y.jar:2.3.0.y]
              at org.apache.storm.utils.Utils$3.run(Utils.java:870) [storm-client-2.3.0.y.jar:2.3.0.y]
              at org.apache.storm.executor.error.ReportErrorAndDie.uncaughtException(ReportErrorAndDie.java:41) [storm-client-2.3.0.y.jar:2.3.0.y]
              at java.lang.Thread.dispatchUncaughtException(Thread.java:1959) [?:1.8.0_242]
      2020-08-04 18:26:08.028 o.a.s.u.Utils ShutdownHook-sleepKill-3s [INFO] Halting after 3 seconds
      

      Since mixed cluster is not guaranteed to work, I am not going to fix this for now. But will add some warnings in the code so people can be aware of it.

      Attachments

        Issue Links

          Activity

            People

              ethanli Ethan Li
              ethanli Ethan Li
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m