Qpid
  1. Qpid
  2. QPID-3828

When sending large messages loss of connection is not detected even with heartbeats enabled

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.14, 0.15
    • Fix Version/s: 0.23
    • Component/s: C++ Client
    • Labels:
      None

      Description

      E.g run qpid-send --broker <remote-broker> --connection-options

      {heartbeat:8}

      --messages 0 --content-size 1600 --report-every 1000 --address amq.topic

      then after some time pull the network cable (you have to do this, a kill -STOP on the broker is not sufficient). The test will continue sending messages then eventually hang (when buffers are full). If you reconnect the cable, the connection will fail but if you don't, it will not.

      If you reduce the size e.g. to 100 bytes , the connection fails as expected after two heartbeat intervals (16 secs in this case).

        Activity

        Gordon Sim created issue -
        Gordon Sim made changes -
        Field Original Value New Value
        Affects Version/s 0.15 [ 12319043 ]
        Gordon Sim made changes -
        Summary Sending large messages does not fail when connection is lost When sending large messages loss of connection is not detected even with heartbeats enabled
        Hide
        Gordon Sim added a comment -

        When the idle timeout fires., the callback request works as expected and we call AsyncIO::queueWriteClose(). However at that point if the socket is not writable (i.e. buffers are full), then we never get the AsyncIO::writable() callback again (at least until the netwrok connection is re-established) and that is where the actual close is handled.

        Show
        Gordon Sim added a comment - When the idle timeout fires., the callback request works as expected and we call AsyncIO::queueWriteClose(). However at that point if the socket is not writable (i.e. buffers are full), then we never get the AsyncIO::writable() callback again (at least until the netwrok connection is re-established) and that is where the actual close is handled.
        Gordon Sim made changes -
        Assignee Andrew Stitcher [ astitcher ]
        Hide
        Gordon Sim added a comment -

        The following change 'fixes' the issue as described, but I suspect it may cause other problems (e.g. leaks or dangling pointers under other coniditions)?

        Index: src/qpid/client/TCPConnector.cpp
        ===================================================================
        --- src/qpid/client/TCPConnector.cpp	(revision 1234761)
        +++ src/qpid/client/TCPConnector.cpp	(working copy)
        @@ -160,7 +160,7 @@
             if (!closed) {
                 if (aio) {
                     // Established connection
        -            aio->requestCallback(boost::bind(&TCPConnector::eof, this, _1));
        +            aio->requestCallback(boost::bind(&TCPConnector::disconnected, this, _1));
                 } else if (connector) {
                     // We're still connecting
                     connector->stop();
        
        
        Show
        Gordon Sim added a comment - The following change 'fixes' the issue as described, but I suspect it may cause other problems (e.g. leaks or dangling pointers under other coniditions)? Index: src/qpid/client/TCPConnector.cpp =================================================================== --- src/qpid/client/TCPConnector.cpp (revision 1234761) +++ src/qpid/client/TCPConnector.cpp (working copy) @@ -160,7 +160,7 @@ if (!closed) { if (aio) { // Established connection - aio->requestCallback(boost::bind(&TCPConnector::eof, this, _1)); + aio->requestCallback(boost::bind(&TCPConnector::disconnected, this, _1)); } else if (connector) { // We're still connecting connector->stop();
        Hide
        Andrew Stitcher added a comment -

        In fact I think this change is indeed the correct fix for this problem -

        In effect it treats a heartbeat failure as if it was the other side of the connection just closing the connection at their end.

        The only potential problem is the connection recovering before the socket is actually closed and this is prevented by running the disconnect operation (which will actually call close() on the the socket ultimately) "on the socket's thread".

        Show
        Andrew Stitcher added a comment - In fact I think this change is indeed the correct fix for this problem - In effect it treats a heartbeat failure as if it was the other side of the connection just closing the connection at their end. The only potential problem is the connection recovering before the socket is actually closed and this is prevented by running the disconnect operation (which will actually call close() on the the socket ultimately) "on the socket's thread".
        Hide
        Andrew Stitcher added a comment -

        This issue has been fixed on trunk in r1475803.

        Show
        Andrew Stitcher added a comment - This issue has been fixed on trunk in r1475803.
        Andrew Stitcher made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Fix Version/s 0.23 [ 12324273 ]
        Resolution Fixed [ 1 ]
        Hide
        Justin Ross added a comment -
        Show
        Justin Ross added a comment - Released in Qpid 0.24, http://qpid.apache.org/releases/qpid-0.24/index.html
        Justin Ross made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Open Open Resolved Resolved
        438d 23h 23m 1 Andrew Stitcher 25/Apr/13 14:46
        Resolved Resolved Closed Closed
        135d 22h 51m 1 Justin Ross 08/Sep/13 13:37

          People

          • Assignee:
            Andrew Stitcher
            Reporter:
            Gordon Sim
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development