Uploaded image for project: 'Apache Storm'
  1. Apache Storm
  2. STORM-3687

Add a warning about possible issues on a mixed cluster if the StormCommon.systemTopology implementation is changed

    XMLWordPrintableJSON

    Details

    • Type: Comment
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.3.0
    • Component/s: None
    • Labels:
      None

      Description

      During rolling upgrade, some supervisors are still at 2.2, and others are running on 2.3. So some workers are running with storm-client-2.2.jar and others are on 2.3.

      Because of https://github.com/apache/storm/commit/93a7f770d508668bc7af183e08535813fff6f805 (STORM-3660)

      the "_credentials" stream is removed.
      So the map of streamId to streamName for system component is changed
      refer to the code at:
      https://github.com/apache/storm/blob/v2.2.0/storm-client/src/jvm/org/apache/storm/serialization/SerializationFactory.java#L218-L222

      So a worker running on storm-client-2.2 might send out _metrics from its systembolt, while the MetricsConsumer running with storm-client-2.3 interprets it as "_metrics_tick" because the mapping is different:

      o.a.s.s.SerializationFactory Thread-15-__system-executor[-1, -1] [INFO] idmap for system comp {__tick=6, __system=5, __metrics_tick=4, __credentials=1, __flush=2, __metrics=3}
      

      vs

      o.a.s.s.SerializationFactory Netty-server-localhost-6703-worker-1 [INFO] idmap for system comp {__tick=5, __system=4, __metrics_tick=3, __flush=1, __metrics=2}
      

      Hence we see

      Caused by: java.lang.RuntimeException: java.lang.ClassCastException: org.apache.storm.metric.api.IMetricsConsumer$TaskInfo cannot be cast to java.lang.Integer
              at org.apache.storm.executor.Executor.accept(Executor.java:293) ~[storm-client-2.3.0.y.jar:2.3.0.y]
              at org.apache.storm.utils.JCQueue.consumeImpl(JCQueue.java:113) ~[storm-client-2.3.0.y.jar:2.3.0.y]
              at org.apache.storm.utils.JCQueue.consume(JCQueue.java:89) ~[storm-client-2.3.0.y.jar:2.3.0.y]
              at org.apache.storm.executor.bolt.BoltExecutor$1.call(BoltExecutor.java:167) ~[storm-client-2.3.0.y.jar:2.3.0.y]
              at org.apache.storm.executor.bolt.BoltExecutor$1.call(BoltExecutor.java:153) ~[storm-client-2.3.0.y.jar:2.3.0.y]
              at org.apache.storm.utils.Utils$1.run(Utils.java:398) ~[storm-client-2.3.0.y.jar:2.3.0.y]
              ... 1 more
      Caused by: java.lang.ClassCastException: org.apache.storm.metric.api.IMetricsConsumer$TaskInfo cannot be cast to java.lang.Integer
              at org.apache.storm.tuple.TupleImpl.getInteger(TupleImpl.java:121) ~[storm-client-2.3.0.y.jar:2.3.0.y]
              at org.apache.storm.executor.Executor.metricsTick(Executor.java:320) ~[storm-client-2.3.0.y.jar:2.3.0.y]
              at org.apache.storm.executor.bolt.BoltExecutor.tupleActionFn(BoltExecutor.java:213) ~[storm-client-2.3.0.y.jar:2.3.0.y]
              at org.apache.storm.executor.Executor.accept(Executor.java:286) ~[storm-client-2.3.0.y.jar:2.3.0.y]
              at org.apache.storm.utils.JCQueue.consumeImpl(JCQueue.java:113) ~[storm-client-2.3.0.y.jar:2.3.0.y]
              at org.apache.storm.utils.JCQueue.consume(JCQueue.java:89) ~[storm-client-2.3.0.y.jar:2.3.0.y]
              at org.apache.storm.executor.bolt.BoltExecutor$1.call(BoltExecutor.java:167) ~[storm-client-2.3.0.y.jar:2.3.0.y]
              at org.apache.storm.executor.bolt.BoltExecutor$1.call(BoltExecutor.java:153) ~[storm-client-2.3.0.y.jar:2.3.0.y]
              at org.apache.storm.utils.Utils$1.run(Utils.java:398) ~[storm-client-2.3.0.y.jar:2.3.0.y]
              ... 1 more
      2020-08-04 18:26:08.017 o.a.s.m.c
      
      
      
      2020-08-04 18:26:08.027 o.a.s.u.Utils Thread-19-__metrics_org.apache.storm.metric.LoggingMetricsConsumer-executor[1555, 1555] [ERROR] Halting process: Worker died
      java.lang.RuntimeException: Halting process: Worker died
              at org.apache.storm.utils.Utils.exitProcess(Utils.java:518) [storm-client-2.3.0.y.jar:2.3.0.y]
              at org.apache.storm.utils.Utils$3.run(Utils.java:870) [storm-client-2.3.0.y.jar:2.3.0.y]
              at org.apache.storm.executor.error.ReportErrorAndDie.uncaughtException(ReportErrorAndDie.java:41) [storm-client-2.3.0.y.jar:2.3.0.y]
              at java.lang.Thread.dispatchUncaughtException(Thread.java:1959) [?:1.8.0_242]
      2020-08-04 18:26:08.027 o.a.s.u.Utils Thread-18-__metrics_org.apache.storm.metric.LoggingMetricsConsumer-executor[1553, 1553] [ERROR] Halting process: Worker died
      java.lang.RuntimeException: Halting process: Worker died
              at org.apache.storm.utils.Utils.exitProcess(Utils.java:518) [storm-client-2.3.0.y.jar:2.3.0.y]
              at org.apache.storm.utils.Utils$3.run(Utils.java:870) [storm-client-2.3.0.y.jar:2.3.0.y]
              at org.apache.storm.executor.error.ReportErrorAndDie.uncaughtException(ReportErrorAndDie.java:41) [storm-client-2.3.0.y.jar:2.3.0.y]
              at java.lang.Thread.dispatchUncaughtException(Thread.java:1959) [?:1.8.0_242]
      2020-08-04 18:26:08.028 o.a.s.u.Utils ShutdownHook-sleepKill-3s [INFO] Halting after 3 seconds
      

      Since mixed cluster is not guaranteed to work, I am not going to fix this for now. But will add some warnings in the code so people can be aware of it.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                ethanli Ethan Li
                Reporter:
                ethanli Ethan Li
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m