Uploaded image for project: 'Qpid'
  1. Qpid
  2. QPID-5747

Federated link ends up in Connecting state forever after connecting to shutting down broker

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.26
    • 0.29
    • C++ Broker
    • None

    Description

      Description of problem:
      Having federation link with source broker S and destination broker D (such that TCP connection is initiated by D and messages flow from S to D), if the link is attempting to reconnect to S while S is just shutting down, there is a probability the link will stay in Connecting state forever.

      Version-Release number of selected component (if applicable):
      0.18-11, 0.18-14, 0.18-20

      How reproducible:
      100% after some time

      Steps to Reproduce:
      1. Mimic broker S by simple python program:

      import socket
      import sys

      1. Create a TCP/IP socket
        sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
      2. Bind the socket to the port
        server_address = ('localhost', 10000)
        print >>sys.stderr, 'starting up on %s port %s' % server_address
        sock.bind(server_address)
      3. Listen for incoming connections
        sock.listen(1)
      1. Wait for a connection
        print >>sys.stderr, 'waiting for a connection'
        connection, client_address = sock.accept()

      2. In one terminal, run it in a loop:
      while true; do python server.py; done

      2a. rather for observation: run tcpdump on port 10000

      3. In another terminal, create federation link to this "server":
      qpid-route link add localhost:5672 localhost:10000

      4. Wait few seconds and generate whatever traffic to the broker to make it busy, i.e.:
      qpid-send -a amq.fanout -m 1000000 --content-size=1000

      5. Check tcpdump when it stops logging new traffic and execute how many times you wish:
      qpid-route link list

      Actual results:
      Everytime and forever, the link status will be Connecting like:

      Host Port Transport Durable State Last Error
      =============================================================================
      localhost 10000 tcp N Connecting Closed by peer

      (expected observation is that python "server" cant bind to port 10000 due to "Address already in use" for some time - that is expected as previous TCP connection is in some FIN_WAIT-like state so far; but even if the "server" can bind to the port after a while, the broker does not attempt to reconnect)

      Expected results:
      Link status flapps between Waiting and Connecting, until the server is ready again and the link is Operational (wont happen in this scenario due to the "server.py" implementation)

      Additional info:
      The key is, the qpid broker can't send initial "AMQP 0-10" frame to the peer. I.e. the bug appears if and only if:

      • TCP connection is fully established (3way handshake) such that qpid::broker::connect method returns success
      • but closed so fast such that Link::established is not invoked / broker does not react on the connection establishment

      That is why it helps / speedups reproducer to put the broker under load.

      Attachments

        Activity

          People

            gsim Gordon Sim
            pmoravec Pavel Moravec
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: