Uploaded image for project: 'Apache Storm'
  1. Apache Storm
  2. STORM-946

We should remove Closed Client form cached-node+port->socket in worker

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 0.10.0, 1.0.0
    • Fix Version/s: None
    • Component/s: storm-core
    • Labels:
      None

      Description

      The client may be Closed status after reconnect failed, and we will remove closed client from Context to escape memory leak.
      But there is also reference for the closed Client in cached-node+port->socket in worker, for this reason we should also remove closed Client from cached-node+port->socket.

      Meanwhile there is another reason for us to do so. Think about this situation: worker A connect to worker B1 B2, but for some reason worker B1 B2 died at the same, then nimbus reschedule worker B1 B1. And new B1 B2 may partly rescheduled at the some host:port as old B1 B2, that is (old B1: host1+port1, old B2: host2+port2, new B1: host2+port2, new B2: host3+port3). Worker A realized worker B1 B2 died and start reconnect to worker B1 B2, but before new worker B1 and old B2 have the same host+port, and by the current logic, we will remove old B1 Client and and create new Client for new worker B2, and do nothing to old B2 and new B1 because they have the same host+port. This will result the topology stop processing tuples. Once we remove closed Client from cached-node+port->socket before refresh-connections, this will not happen again.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                tedxia xiajun
              • Votes:
                1 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated: