Qpid
  1. Qpid
  2. QPID-3828

When sending large messages loss of connection is not detected even with heartbeats enabled

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.14, 0.15
    • Fix Version/s: 0.23
    • Component/s: C++ Client
    • Labels:
      None

      Description

      E.g run qpid-send --broker <remote-broker> --connection-options

      {heartbeat:8}

      --messages 0 --content-size 1600 --report-every 1000 --address amq.topic

      then after some time pull the network cable (you have to do this, a kill -STOP on the broker is not sufficient). The test will continue sending messages then eventually hang (when buffers are full). If you reconnect the cable, the connection will fail but if you don't, it will not.

      If you reduce the size e.g. to 100 bytes , the connection fails as expected after two heartbeat intervals (16 secs in this case).

        Activity

        Hide
        Gordon Sim added a comment -

        When the idle timeout fires., the callback request works as expected and we call AsyncIO::queueWriteClose(). However at that point if the socket is not writable (i.e. buffers are full), then we never get the AsyncIO::writable() callback again (at least until the netwrok connection is re-established) and that is where the actual close is handled.

        Show
        Gordon Sim added a comment - When the idle timeout fires., the callback request works as expected and we call AsyncIO::queueWriteClose(). However at that point if the socket is not writable (i.e. buffers are full), then we never get the AsyncIO::writable() callback again (at least until the netwrok connection is re-established) and that is where the actual close is handled.
        Hide
        Gordon Sim added a comment -

        The following change 'fixes' the issue as described, but I suspect it may cause other problems (e.g. leaks or dangling pointers under other coniditions)?

        Index: src/qpid/client/TCPConnector.cpp
        ===================================================================
        --- src/qpid/client/TCPConnector.cpp	(revision 1234761)
        +++ src/qpid/client/TCPConnector.cpp	(working copy)
        @@ -160,7 +160,7 @@
             if (!closed) {
                 if (aio) {
                     // Established connection
        -            aio->requestCallback(boost::bind(&TCPConnector::eof, this, _1));
        +            aio->requestCallback(boost::bind(&TCPConnector::disconnected, this, _1));
                 } else if (connector) {
                     // We're still connecting
                     connector->stop();
        
        
        Show
        Gordon Sim added a comment - The following change 'fixes' the issue as described, but I suspect it may cause other problems (e.g. leaks or dangling pointers under other coniditions)? Index: src/qpid/client/TCPConnector.cpp =================================================================== --- src/qpid/client/TCPConnector.cpp (revision 1234761) +++ src/qpid/client/TCPConnector.cpp (working copy) @@ -160,7 +160,7 @@ if (!closed) { if (aio) { // Established connection - aio->requestCallback(boost::bind(&TCPConnector::eof, this, _1)); + aio->requestCallback(boost::bind(&TCPConnector::disconnected, this, _1)); } else if (connector) { // We're still connecting connector->stop();
        Hide
        Andrew Stitcher added a comment -

        In fact I think this change is indeed the correct fix for this problem -

        In effect it treats a heartbeat failure as if it was the other side of the connection just closing the connection at their end.

        The only potential problem is the connection recovering before the socket is actually closed and this is prevented by running the disconnect operation (which will actually call close() on the the socket ultimately) "on the socket's thread".

        Show
        Andrew Stitcher added a comment - In fact I think this change is indeed the correct fix for this problem - In effect it treats a heartbeat failure as if it was the other side of the connection just closing the connection at their end. The only potential problem is the connection recovering before the socket is actually closed and this is prevented by running the disconnect operation (which will actually call close() on the the socket ultimately) "on the socket's thread".
        Hide
        Andrew Stitcher added a comment -

        This issue has been fixed on trunk in r1475803.

        Show
        Andrew Stitcher added a comment - This issue has been fixed on trunk in r1475803.
        Hide
        Justin Ross added a comment -
        Show
        Justin Ross added a comment - Released in Qpid 0.24, http://qpid.apache.org/releases/qpid-0.24/index.html

          People

          • Assignee:
            Andrew Stitcher
            Reporter:
            Gordon Sim
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development