Uploaded image for project: 'Apache Storm'
  1. Apache Storm
  2. STORM-3735

Kyro serialization fails on some metric tuples when topology.fall.back.on.java.serialization is false

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.0.0, 2.1.0, 2.2.0
    • 2.3.0
    • None
    • None

    Description

      When a metric consumer is used, metrics will be sent from all executors to the consumer. In some of the metrics, it includes NodeInfo object, and kryo serialization will fail if topology.fall.back.on.java.serialization is false.

      worker logs
      2021-01-13 20:16:37.017 o.a.s.e.ExecutorTransfer Thread-16-__system-executor[-1, -1] [INFO] TRANSFERRING tuple [dest: 5 tuple: source: __system:-1, stream: __metrics, id: {}, [TASK_INFO: { host: openstorm14blue-n4.blue.ygrid.yahoo.com:6703 comp: __system[-1]}, [
      [CGroupCpuStat = {nr.throttled-percentage=46.544980443285525, nr.period-count=767, nr.throttled-count=357, throttled.time-ms=27208}], [CGroupMemoryLimit = 1342177280], [__recv-iconnection = {dequeuedMessages=0, enqueued={/10.215.73.210:47038=3169}}], [__send-ico
      nnection = {NodeInfo(node:149a917b-bc75-49c8-b351-f74b8ae0fbed-10.215.73.210, port:[6701])={reconnects=1, src=/10.215.73.210:34938, pending=0, dest=openstorm14blue-n4.blue.ygrid.yahoo.com/10.215.73.210:6701, sent=1896, lostOnSend=0}, NodeInfo(node:149a917b-bc75-
      49c8-b351-f74b8ae0fbed-10.215.73.210, port:[6702])={reconnects=8, src=/10.215.73.210:39476, pending=0, dest=openstorm14blue-n4.blue.ygrid.yahoo.com/10.215.73.210:6702, sent=2115, lostOnSend=0}, NodeInfo(node:b77b5ec6-15ee-4bd2-a9b8-12fcadde7744-10.215.73.211, po
      rt:[6700])={reconnects=125, pending=0, dest=openstorm14blue-n5.blue.ygrid.yahoo.com/10.215.73.211:6700, sent=108, lostOnSend=1331}}], [CGroupMemory = 316485632], [CGroupCpu = {user-ms=36960, sys-ms=25860}], [memory.pools.Metaspace.usage = 0.9695890907929322], [m
      emory.heap.max = 1073741824], [receive-queue-overflow = 0], [memory.pools.Compressed-Class-Space.used = 6237424], [memory.pools.Compressed-Class-Space.max = 1073741824], [memory.non-heap.init = 2555904], [worker-transfer-queue-overflow = 0], [memory.pools.Metasp
      ace.committed = 42074112], [receive-queue-sojourn_time_ms = 0.0], [threads.waiting.count = 5], [memory.pools.G1-Eden-Space.usage = 0.2777777777777778], [memory.pools.Metaspace.used = 40798320], [memory.total.used = 101783888], [memory.pools.Code-Cache.init = 255
      5904], [memory.non-heap.committed = 63832064], [GC.G1-Young-Generation.time = 677], [receive-queue-insert_failures = 0.0], [memory.total.init = 130482176], [GC.G1-Old-Generation.count = 0], [memory.pools.Metaspace.init = 0], [memory.pools.G1-Survivor-Space.commi
      tted = 5242880], [worker-transfer-queue-population = 0], [memory.pools.Compressed-Class-Space.committed = 6684672], [threads.timed_waiting.count = 31], [memory.pools.G1-Eden-Space.init = 7340032], [memory.pools.Metaspace.max = -1], [memory.pools.G1-Survivor-Spac
      e.used = 5242880], [memory.heap.init = 127926272], [memory.pools.G1-Old-Gen.used-after-gc = 0], [worker-transfer-queue-capacity = 1024], [memory.pools.G1-Survivor-Space.used-after-gc = 5242880], [memory.pools.G1-Old-Gen.committed = 47185920], [memory.pools.G1-Ed
      en-Space.committed = 75497472], [receive-queue-arrival_rate_secs = 0.109421162052741], [memory.pools.Compressed-Class-Space.usage = 0.0058090537786483765], [TGT-TimeToExpiryMsecs = 71282993], [threads.runnable.count = 15], [worker-transfer-queue-insert_failures
      = 0.0], [worker-transfer-queue-sojourn_time_ms = 0.0], [memory.heap.committed = 127926272], [memory.non-heap.max = -1], [threads.daemon.count = 29], [memory.pools.Code-Cache.max = 251658240], [worker-transfer-queue-arrival_rate_secs = 90.47776674390379], [memory
      .heap.usage = 0.037109360098838806], [memory.pools.G1-Old-Gen.init = 120586240], [memory.pools.Code-Cache.committed = 15138816], [receive-queue-pct_full = 0.0], [worker-transfer-queue-pct_full = 0.0], [receive-queue-population = 0], [memory.pools.Compressed-Clas
      s-Space.init = 0], [memory.pools.Code-Cache.usage = 0.059299468994140625], [worker-transfer-queue-dropped_messages = 0], [GC.G1-Young-Generation.count = 18], [memory.pools.Code-Cache.used = 14923200], [memory.pools.G1-Old-Gen.usage = 0.012695297598838806], [memo
      ry.non-heap.usage = -6.196368E7], [memory.total.max = 1073741823], [threads.count = 51], [memory.heap.used = 39845872], [memory.pools.G1-Survivor-Space.init = 0], [memory.pools.G1-Old-Gen.used = 13631472], [receive-queue-dropped_messages = 0], [threads.terminate
      d.count = 0], [memory.pools.G1-Eden-Space.max = -1], [uptimeSecs = 76], [threads.deadlock.count = 0], [threads.blocked.count = 0], [newWorkerEvent = 1], [receive-queue-capacity = 32768], [threads.new.count = 0], [startTimeSecs = 1610568920], [memory.pools.G1-Ede
      n-Space.used-after-gc = 0], [memory.pools.G1-Eden-Space.used = 20971520], [GC.G1-Old-Generation.time = 0], [memory.non-heap.used = 61964384], [memory.pools.G1-Old-Gen.max = 1073741824], [memory.pools.G1-Survivor-Space.max = -1], [memory.pools.G1-Survivor-Space.u
      sage = 1.0], [memory.total.committed = 191823872], [doHeartbeat-calls.count = 64], [doHeartbeat-calls.m1_rate = 1.0730202200365234E-6], [doHeartbeat-calls.m5_rate = 1.1636999000665182E-6], [doHeartbeat-calls.m15_rate = 1.1870955900857726E-6], [doHeartbeat-calls.
      mean_rate = 1.0067076836696486E-6]]] PROC_START_TIME(sampled): null EXEC_START_TIME(sampled): null]
      
      ...
      
      2021-01-13 20:16:37.030 o.a.s.u.Utils Thread-16-__system-executor[-1, -1] [ERROR] Async loop died!
      java.lang.RuntimeException: com.esotericsoftware.kryo.KryoException: java.lang.IllegalArgumentException: Class is not registered: org.apache.storm.generated.NodeInfo
      Note: To register this class use: kryo.register(org.apache.storm.generated.NodeInfo.class);
      Serialization trace:
      value (org.apache.storm.metric.api.IMetricsConsumer$DataPoint)
              at org.apache.storm.executor.Executor.accept(Executor.java:294) ~[storm-client-2.3.0.y.jar:2.3.0.y]
              at org.apache.storm.utils.JCQueue.consumeImpl(JCQueue.java:113) ~[storm-client-2.3.0.y.jar:2.3.0.y]
              at org.apache.storm.utils.JCQueue.consume(JCQueue.java:89) ~[storm-client-2.3.0.y.jar:2.3.0.y]
              at org.apache.storm.executor.bolt.BoltExecutor$1.call(BoltExecutor.java:159) ~[storm-client-2.3.0.y.jar:2.3.0.y]
              at org.apache.storm.executor.bolt.BoltExecutor$1.call(BoltExecutor.java:145) ~[storm-client-2.3.0.y.jar:2.3.0.y]
              at org.apache.storm.utils.Utils$1.run(Utils.java:401) [storm-client-2.3.0.y.jar:2.3.0.y]
              at java.lang.Thread.run(Thread.java:748) [?:1.8.0_262]
      Caused by: com.esotericsoftware.kryo.KryoException: java.lang.IllegalArgumentException: Class is not registered: org.apache.storm.generated.NodeInfo
      Note: To register this class use: kryo.register(org.apache.storm.generated.NodeInfo.class);
      Serialization trace:
      value (org.apache.storm.metric.api.IMetricsConsumer$DataPoint)
              at com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:101) ~[kryo-3.0.3.jar:?]
              at com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:518) ~[kryo-3.0.3.jar:?]
              at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628) ~[kryo-3.0.3.jar:?]
      Serialization trace:
      value (org.apache.storm.metric.api.IMetricsConsumer$DataPoint)
              at org.apache.storm.executor.Executor.accept(Executor.java:294) ~[storm-client-2.3.0.y.jar:2.3.0.y]
              at org.apache.storm.utils.JCQueue.consumeImpl(JCQueue.java:113) ~[storm-client-2.3.0.y.jar:2.3.0.y]
              at org.apache.storm.utils.JCQueue.consume(JCQueue.java:89) ~[storm-client-2.3.0.y.jar:2.3.0.y]
              at org.apache.storm.executor.bolt.BoltExecutor$1.call(BoltExecutor.java:159) ~[storm-client-2.3.0.y.jar:2.3.0.y]
              at org.apache.storm.executor.bolt.BoltExecutor$1.call(BoltExecutor.java:145) ~[storm-client-2.3.0.y.jar:2.3.0.y]
              at org.apache.storm.utils.Utils$1.run(Utils.java:401) [storm-client-2.3.0.y.jar:2.3.0.y]
              at java.lang.Thread.run(Thread.java:748) [?:1.8.0_262]
      Caused by: com.esotericsoftware.kryo.KryoException: java.lang.IllegalArgumentException: Class is not registered: org.apache.storm.generated.NodeInfo
      Note: To register this class use: kryo.register(org.apache.storm.generated.NodeInfo.class);
      Serialization trace:
      value (org.apache.storm.metric.api.IMetricsConsumer$DataPoint)
              at com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:101) ~[kryo-3.0.3.jar:?]
              at com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:518) ~[kryo-3.0.3.jar:?]
              at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628) ~[kryo-3.0.3.jar:?]
              at com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:100) ~[kryo-3.0.3.jar:?]
              at com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:40) ~[kryo-3.0.3.jar:?]
              at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628) ~[kryo-3.0.3.jar:?]
              at com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:100) ~[kryo-3.0.3.jar:?]
              at com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:40) ~[kryo-3.0.3.jar:?]
              at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:534) ~[kryo-3.0.3.jar:?]
              at org.apache.storm.serialization.KryoValuesSerializer.serializeInto(KryoValuesSerializer.java:38) ~[storm-client-2.3.0.y.jar:2.3.0.y]
              at org.apache.storm.serialization.KryoTupleSerializer.serialize(KryoTupleSerializer.java:40) ~[storm-client-2.3.0.y.jar:2.3.0.y]
              at org.apache.storm.daemon.worker.WorkerTransfer.tryTransferRemote(WorkerTransfer.java:118) ~[storm-client-2.3.0.y.jar:2.3.0.y]
              at org.apache.storm.daemon.worker.WorkerState.tryTransferRemote(WorkerState.java:553) ~[storm-client-2.3.0.y.jar:2.3.0.y]
              at org.apache.storm.executor.ExecutorTransfer.tryTransfer(ExecutorTransfer.java:68) ~[storm-client-2.3.0.y.jar:2.3.0.y]
              at org.apache.storm.daemon.Task.sendUnanchored(Task.java:215) ~[storm-client-2.3.0.y.jar:2.3.0.y]
              at org.apache.storm.executor.Executor.metricsTick(Executor.java:345) ~[storm-client-2.3.0.y.jar:2.3.0.y]
              at org.apache.storm.executor.bolt.BoltExecutor.tupleActionFn(BoltExecutor.java:205) ~[storm-client-2.3.0.y.jar:2.3.0.y]
              at org.apache.storm.executor.Executor.accept(Executor.java:290) ~[storm-client-2.3.0.y.jar:2.3.0.y]
              ... 6 more
      Caused by: java.lang.IllegalArgumentException: Class is not registered: org.apache.storm.generated.NodeInfo
      Note: To register this class use: kryo.register(org.apache.storm.generated.NodeInfo.class);
              at com.esotericsoftware.kryo.Kryo.getRegistration(Kryo.java:488) ~[kryo-3.0.3.jar:?]
              at com.esotericsoftware.kryo.util.DefaultClassResolver.writeClass(DefaultClassResolver.java:97) ~[kryo-3.0.3.jar:?]
              at com.esotericsoftware.kryo.Kryo.writeClass(Kryo.java:517) ~[kryo-3.0.3.jar:?]
              at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:622) ~[kryo-3.0.3.jar:?]
              at com.esotericsoftware.kryo.serializers.MapSerializer.write(MapSerializer.java:106) ~[kryo-3.0.3.jar:?]
              at com.esotericsoftware.kryo.serializers.MapSerializer.write(MapSerializer.java:39) ~[kryo-3.0.3.jar:?]
              at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552) ~[kryo-3.0.3.jar:?]
              at com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:80) ~[kryo-3.0.3.jar:?]
              at com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:518) ~[kryo-3.0.3.jar:?]
              at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628) ~[kryo-3.0.3.jar:?]
              at com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:100) ~[kryo-3.0.3.jar:?]
              at com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:40) ~[kryo-3.0.3.jar:?]
              at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628) ~[kryo-3.0.3.jar:?]
              at com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:100) ~[kryo-3.0.3.jar:?]
              at com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:40) ~[kryo-3.0.3.jar:?]
              at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:534) ~[kryo-3.0.3.jar:?]
              at org.apache.storm.serialization.KryoValuesSerializer.serializeInto(KryoValuesSerializer.java:38) ~[storm-client-2.3.0.y.jar:2.3.0.y]
              at org.apache.storm.serialization.KryoTupleSerializer.serialize(KryoTupleSerializer.java:40) ~[storm-client-2.3.0.y.jar:2.3.0.y]
              at org.apache.storm.daemon.worker.WorkerTransfer.tryTransferRemote(WorkerTransfer.java:118) ~[storm-client-2.3.0.y.jar:2.3.0.y]
              at org.apache.storm.daemon.worker.WorkerState.tryTransferRemote(WorkerState.java:553) ~[storm-client-2.3.0.y.jar:2.3.0.y]
              at org.apache.storm.executor.ExecutorTransfer.tryTransfer(ExecutorTransfer.java:68) ~[storm-client-2.3.0.y.jar:2.3.0.y]
              at org.apache.storm.daemon.Task.sendUnanchored(Task.java:215) ~[storm-client-2.3.0.y.jar:2.3.0.y]
              at org.apache.storm.executor.Executor.metricsTick(Executor.java:345) ~[storm-client-2.3.0.y.jar:2.3.0.y]
              at org.apache.storm.executor.bolt.BoltExecutor.tupleActionFn(BoltExecutor.java:205) ~[storm-client-2.3.0.y.jar:2.3.0.y]
              at org.apache.storm.executor.Executor.accept(Executor.java:290) ~[storm-client-2.3.0.y.jar:2.3.0.y]
              ... 6 more
      

      The related metric is "__send-iconnection" from https://github.com/apache/storm/blob/7bef73a6faa14558ef254efe74cbe4bfef81c2e2/storm-client/src/jvm/org/apache/storm/daemon/metrics/BuiltinMetricsUtil.java#L40-L43

      Note that this can only be reproduced when metrics are sent across workers (otherwise there is no serialization).

      The work around is one of the following
      1) add org.apache.storm.generated.NodeInfo to topology.kryo.register in topology conf
      2) set topology.fall.back.on.java.serialization true or unset topology.fall.back.on.java.serialization since the default is true

      The fix is to register NodeInfo class in kryo.
      https://github.com/apache/storm/blob/7bef73a6faa14558ef254efe74cbe4bfef81c2e2/storm-client/src/jvm/org/apache/storm/serialization/SerializationFactory.java#L67-L77

      Attachments

        Issue Links

          Activity

            People

              ethanli Ethan Li
              ethanli Ethan Li
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m