Uploaded image for project: 'Apache Storm'
  1. Apache Storm
  2. STORM-3763

Backpressure message ignored by the receiver caused the topology to not progress

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments


    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.0.0, 2.1.0, 2.2.0
    • 2.3.0, 2.2.1
    • None
    • None


      We have noticed a case where topology is stuck due to the mis-interpretation of backpressure messge:

      At beginning, the topology ran fine but a downstream component had backpressure, so it sent backpressure signal to its upstream component, and the upstream component paused sending data to the downstream bolt.
      Then the downstream component restarted (due to any reason, for example, killed by supervisor due to heartbeat timeout). When it came back up, it sends backpressure message to the upstream bolt. However, the upstream component didn't know how to interpret the backpressure message so it logs the below error and ignores the message.

      2021-01-28 19:41:37.175 o.a.s.m.n.SaslStormClientHandler client-worker-1 [ERROR] Unexpected message from server: {worker=4c38160a-3c66-4eff-8572-2d0c493bd6c1, bpStatusId=254, bpTasks=[], nonBpTasks=[546, 790, 863]}

      Then the downstream component will not receive any data from the upstream component, so it won't have any backpressure (since no data is sent to it), hence it won't send any backpressure update message to the upstream component. This leads to a dead situation that the upstream component thinks the downstream has backpressure so it paused sending data to it, while the downstream doesn't have backpressure but can't receive any data from upstream. The topology is stuck because of it.

      Let's look at the code:

      When the connection between the downstream (server) and upstream (client) is established,
      server invokes



      which sends backpressure messages to the client.

      This is because in this pipeline, "StormServerHandler" is the only one with that implemented "channelActive()" method.


      However, the Client side expects authentication messages.


      so the client can't interpret the backpressure message at the beginning, hence "unexpected message".

      This can be supported with an example. I have a wordcount topology running. At the startup, the client tries to connect to the server. Once connected, it sends a "SASL_TOKEN_MESSAGE_REQUEST".

      client log

      021-01-29 19:03:21.355 o.a.s.m.n.SaslStormClientHandler client-worker-1 [DEBUG] SASL credentials for storm topology wc is -8603731884381183101:-9091319821854384981
      2021-01-29 19:03:21.359 o.a.s.m.n.Client client-worker-1 [DEBUG] successfully connected to openstorm14blue-n3.blue.ygrid.yahoo.com/, [id: 0x29da2e9c, L:/ - R:openstorm14blue-n3.blue.ygrid.yahoo.com/] [attempt 12]
      2021-01-29 19:03:21.359 o.a.s.m.n.SaslStormClientHandler client-worker-1 [INFO] Connection established from / to openstorm14blue-n3.blue.ygrid.yahoo.com/
      2021-01-29 19:03:21.362 o.a.s.m.n.SaslStormClientHandler client-worker-1 [DEBUG] Creating saslNettyClient now for channel: [id: 0x29da2e9c, L:/ - R:openstorm14blue-n3.blue.ygrid.yahoo.com/]
      2021-01-29 19:03:21.363 o.a.s.m.n.SaslNettyClient client-worker-1 [DEBUG] SaslNettyClient: Creating SASL DIGEST-MD5 client to authenticate to server
      2021-01-29 19:03:21.368 o.a.s.m.n.SaslStormClientHandler client-worker-1 [DEBUG] Sending SASL_TOKEN_MESSAGE_REQUEST
      2021-01-29 19:03:21.632 o.a.s.m.n.SaslStormClientHandler client-worker-1 [DEBUG] send/recv time (ms): 277
      2021-01-29 19:03:21.633 o.a.s.m.n.SaslStormClientHandler client-worker-1 [ERROR] Unexpected message from server: {worker=cdf6f963-678c-45a4-91d2-e1067a9a8516, bpStatusId=1, bpTasks=[], nonBpTasks=[17, 1, 18, 3, 4, 22, 7, 8, 9, 12,

      But the server sends the backpressure message first, before it deals with the SASL_TOKEN_MESSAGE_REQUEST message

      server log

      2021-01-29 19:03:21.473 o.a.s.m.n.SaslStormServerHandler Netty-server-localhost-6702-worker-1 [DEBUG] SASL credentials for storm topology wc is -8603731884381183101:-9091319821854384981
      2021-01-29 19:03:21.482 o.a.s.u.Utils main [DEBUG] Using storm.yaml from resources
      2021-01-29 19:03:21.490 o.a.s.d.w.WorkerState Netty-server-localhost-6702-worker-1 [INFO] Sending BackPressure status to new client. BPStatus: {worker=cdf6f963-678c-45a4-91d2-e1067a9a8516, bpStatusId=1, bpTasks=[], nonBpTasks=[17,
      1, 18, 3, 4, 22, 7, 8, 9, 12, 13]}
      2021-01-29 19:03:21.510 o.a.s.m.n.SaslStormClientHandler client-worker-1 [DEBUG] SASL credentials for storm topology wc is -8603731884381183101:-9091319821854384981
      2021-01-29 19:03:21.572 o.a.s.s.i.n.u.Recycler Netty-server-localhost-6702-worker-1 [DEBUG] -Dio.netty.recycler.maxCapacityPerThread: 4096
      2021-01-29 19:03:21.573 o.a.s.s.i.n.u.Recycler Netty-server-localhost-6702-worker-1 [DEBUG] -Dio.netty.recycler.maxSharedCapacityFactor: 2
      2021-01-29 19:03:21.573 o.a.s.s.i.n.u.Recycler Netty-server-localhost-6702-worker-1 [DEBUG] -Dio.netty.recycler.linkCapacity: 16
      2021-01-29 19:03:21.574 o.a.s.s.i.n.u.Recycler Netty-server-localhost-6702-worker-1 [DEBUG] -Dio.netty.recycler.ratio: 8
      2021-01-29 19:03:21.575 o.a.s.v.ConfigValidation main [WARN] topology.backpressure.enable is a deprecated config please see class org.apache.storm.Config.TOPOLOGY_BACKPRESSURE_ENABLE for more information.
      2021-01-29 19:03:21.593 o.a.s.s.i.n.b.AbstractByteBuf Netty-server-localhost-6702-worker-1 [DEBUG] -Dorg.apache.storm.shade.io.netty.buffer.checkAccessible: true
      2021-01-29 19:03:21.594 o.a.s.s.i.n.b.AbstractByteBuf Netty-server-localhost-6702-worker-1 [DEBUG] -Dorg.apache.storm.shade.io.netty.buffer.checkBounds: true
      2021-01-29 19:03:21.594 o.a.s.s.i.n.u.ResourceLeakDetectorFactory Netty-server-localhost-6702-worker-1 [DEBUG] Loaded default ResourceLeakDetector: org.apache.storm.shade.io.netty.util.ResourceLeakDetector@524a134b
      2021-01-29 19:03:21.695 o.a.s.m.n.SaslStormServerHandler Netty-server-localhost-6702-worker-1 [DEBUG] No saslNettyServer for [id: 0x6fa65bc5, L:/ - R:/] yet; creating now, with topology token: wc
      2021-01-29 19:03:21.697 o.a.s.m.n.SaslNettyServer Netty-server-localhost-6702-worker-1 [DEBUG] SaslNettyServer: Topology token is: wc with authmethod DIGEST-MD5
      2021-01-29 19:03:21.698 o.a.s.m.n.SaslNettyServer Netty-server-localhost-6702-worker-1 [DEBUG] SaslDigestCallback: Creating SaslDigestCallback handler with topology token: wc


      This is a bug likely introduced in STORM-2306 (https://github.com/apache/storm/pull/2502). This willl happen on every topology when "storm.messaging.netty.authentication" is set true (It is false by default)



          This comment will be Viewable by All Users Viewable by All Users


            ethanli Ethan Li
            ethanli Ethan Li
            0 Vote for this issue
            1 Start watching this issue



              Time Tracking

                Original Estimate - Not Specified
                Not Specified
                Remaining Estimate - 0h
                Time Spent - 20m


                  Issue deployment