ActiveMQ
  1. ActiveMQ
  2. AMQ-1376

Improperly closed connections preventing message redelivery

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Duplicate
    • Affects Version/s: 5.0.0
    • Fix Version/s: AGING_TO_DIE
    • Component/s: Transport
    • Labels:
      None

      Description

      This is a reproducible case of a DEAD Consumer that never gets cleaned up.

      I am using telnet to manually test STOMP message consumption.

      First I put a message into the queue

      I then connect and subscribe to that queue and get the message:

      CONNECT
      login: test
      passcode: test

      ^@
      CONNECTED
      session:ID:jacob-64807-1188509209664-4:3

      SUBSCRIBE
      destination: /queue/Prescriptions
      ack: client

      ^@

      This works and I receive the queued messages.
      They remain in the Q because I am not send ACK

      If I use the DISCONNECT command. I am properly disconnected and I can repeat this process to get the same message again. Good.

      If I disconnect by killing the telnet process I see the following stack trace in MQ. AND I can still repeat the same process of re-retrieving the un-acknowledged messages:
      DEBUG Transport - Transport failed: java.io.EOFException
      java.io.EOFException
      at java.io.DataInputStream.readByte(DataInputStream.java:243)
      at org.apache.activemq.transport.stomp.StompWireFormat.readLine(StompWireFormat.java:186)
      at org.apache.activemq.transport.stomp.StompWireFormat.unmarshal(StompWireFormat.java:94)
      at org.apache.activemq.transport.tcp.TcpTransport.readCommand(TcpTransport.java:196)
      at org.apache.activemq.transport.tcp.TcpTransport.doRun(TcpTransport.java:188)
      at org.apache.activemq.transport.tcp.TcpTransport.run(TcpTransport.java:176)
      at java.lang.Thread.run(Thread.java:613)
      DEBUG TransportConnection - Stopping connection: /XXXXXXXXXXXXXX:4880
      DEBUG TcpTransport - Stopping transport tcp:///XXXXXXXXXXXXXX:4880
      DEBUG TransportConnection - Stopped connection: /XXXXXXXXXXXXXX:4880
      DEBUG TransportConnection - Cleaning up connection resources: /XXXXXXXXXXXXXX:4880
      DEBUG AMQPersistenceAdapter - Checkpoint started.
      DEBUG AMQPersistenceAdapter - Checkpoint done.

      HOWEVER,
      If I disconnect by repeatedly typing Control-C to close the telnet program I see the following stack trace:
      DEBUG Transport - Transport failed: org.apache.activemq.transport.stomp.ProtocolException: Unable to parser header line [????????????]
      org.apache.activemq.transport.stomp.ProtocolException: Unable to parser header line [????????????]
      at org.apache.activemq.transport.stomp.StompWireFormat.unmarshal(StompWireFormat.java:121)
      at org.apache.activemq.transport.tcp.TcpTransport.readCommand(TcpTransport.java:196)
      at org.apache.activemq.transport.tcp.TcpTransport.doRun(TcpTransport.java:188)
      at org.apache.activemq.transport.tcp.TcpTransport.run(TcpTransport.java:176)
      at java.lang.Thread.run(Thread.java:613)
      DEBUG TransportConnection - Stopping connection: /XXXXXXXXXXXXXX:64820
      DEBUG TcpTransport - Stopping transport tcp:///XXXXXXXXXXXXXX:64820
      DEBUG AMQPersistenceAdapter - Checkpoint started.
      DEBUG AMQPersistenceAdapter - Checkpoint done.

      AND, I am no longer able to retrieve the queued up messages. Looking at the admin console I see Number Of Consumers = 1, leading me to believe that ActiveMQ didn't properly handle the disconnection. In the other 2 cases (DISCONNECT and kill) the "Number Of Consumers" drops to zero on connection termination.

      I believe the correct behavior should be to properly handle and clean-up the connection on bad data. Or perhaps periodically check each of the supposed "Consumers" to make sure that they are still alive. This is clearly a reproducible case of a DEAD Consumer that never gets cleaned up.

      Comparing the DEBUG output the follows the 2 stack traces, it is clear in the second case that ActiveMQ fails to clean up the connections resources for the unexpectedly disconnected consumer.

        Activity

        Jacob Burkhart created issue -
        Hide
        Roger Hoover added a comment -

        This problem doesn't just prevent redelivery. The thread assigned to that STOMP connection never gets released. Over time, these unused threads build up until some limit is reached and broker becomes unresponsive.

        Show
        Roger Hoover added a comment - This problem doesn't just prevent redelivery. The thread assigned to that STOMP connection never gets released. Over time, these unused threads build up until some limit is reached and broker becomes unresponsive.
        Hide
        Hiram Chirino added a comment -

        I believe that the improved connection shutdown code in the current trunk should resolve this issue. Could you re-test against trunk and let me know if you still see the same thing?

        Show
        Hiram Chirino added a comment - I believe that the improved connection shutdown code in the current trunk should resolve this issue. Could you re-test against trunk and let me know if you still see the same thing?
        Hiram Chirino made changes -
        Field Original Value New Value
        Assignee Hiram Chirino [ chirino ]
        Rob Davies made changes -
        Fix Version/s 5.3.0 [ 11914 ]
        Gary Tully made changes -
        Fix Version/s 5.4.0 [ 12110 ]
        Fix Version/s 5.3.0 [ 11914 ]
        Bruce Snyder made changes -
        Fix Version/s 5.4.0 [ 12110 ]
        Fix Version/s AGING_TO_DIE [ 12187 ]
        Hide
        Bruce Snyder added a comment -

        Please provide feedback regarding this issue and whether it been fixed in the latest release of ActiveMQ (5.3). If there is no feedback, this issue will be closed.

        Show
        Bruce Snyder added a comment - Please provide feedback regarding this issue and whether it been fixed in the latest release of ActiveMQ (5.3). If there is no feedback, this issue will be closed.
        Hide
        Vipul Kothiyal added a comment -

        This problem still persists with 5.3.0, MQ doesn't close teh connection cleanly.

        2009-12-08 17:00:07,762 | DEBUG | Stopped transport: /10.52.68.200:37123 | org.apache.activemq.broker.TransportConnection | ActiveMQ Transport Stopper: /10.52.68.200:37123
        2009-12-08 17:00:07,762 | DEBUG | Cleaning up connection resources: /10.52.68.200:37123 | org.apache.activemq.broker.TransportConnection | ActiveMQ Transport Stopper: /10.52.68.200:37123
        2009-12-08 17:00:07,762 | DEBUG | Connection Stopped: /10.52.68.200:37123 | org.apache.activemq.broker.TransportConnection | ActiveMQ Transport Stopper: /10.52.68.200:37123
        2009-12-08 17:00:07,763 | DEBUG | Transport failed: java.io.EOFException | org.apache.activemq.broker.TransportConnection.Transport | ActiveMQ Transport: tcp:///10.52.68.200:37122
        java.io.EOFException
        at java.io.DataInputStream.readByte(DataInputStream.java:243)
        at org.apache.activemq.transport.stomp.StompWireFormat.readLine(StompWireFormat.java:186)
        at org.apache.activemq.transport.stomp.StompWireFormat.unmarshal(StompWireFormat.java:94)
        at org.apache.activemq.transport.tcp.TcpTransport.readCommand(TcpTransport.java:210)
        at org.apache.activemq.transport.tcp.TcpTransport.doRun(TcpTransport.java:202)
        at org.apache.activemq.transport.tcp.TcpTransport.run(TcpTransport.java:185)
        at java.lang.Thread.run(Thread.java:595)

        Show
        Vipul Kothiyal added a comment - This problem still persists with 5.3.0, MQ doesn't close teh connection cleanly. 2009-12-08 17:00:07,762 | DEBUG | Stopped transport: /10.52.68.200:37123 | org.apache.activemq.broker.TransportConnection | ActiveMQ Transport Stopper: /10.52.68.200:37123 2009-12-08 17:00:07,762 | DEBUG | Cleaning up connection resources: /10.52.68.200:37123 | org.apache.activemq.broker.TransportConnection | ActiveMQ Transport Stopper: /10.52.68.200:37123 2009-12-08 17:00:07,762 | DEBUG | Connection Stopped: /10.52.68.200:37123 | org.apache.activemq.broker.TransportConnection | ActiveMQ Transport Stopper: /10.52.68.200:37123 2009-12-08 17:00:07,763 | DEBUG | Transport failed: java.io.EOFException | org.apache.activemq.broker.TransportConnection.Transport | ActiveMQ Transport: tcp:///10.52.68.200:37122 java.io.EOFException at java.io.DataInputStream.readByte(DataInputStream.java:243) at org.apache.activemq.transport.stomp.StompWireFormat.readLine(StompWireFormat.java:186) at org.apache.activemq.transport.stomp.StompWireFormat.unmarshal(StompWireFormat.java:94) at org.apache.activemq.transport.tcp.TcpTransport.readCommand(TcpTransport.java:210) at org.apache.activemq.transport.tcp.TcpTransport.doRun(TcpTransport.java:202) at org.apache.activemq.transport.tcp.TcpTransport.run(TcpTransport.java:185) at java.lang.Thread.run(Thread.java:595)
        Hide
        Sergey Rem added a comment -

        May be it exists also in 5.4.0 and in 5.4.1. After one wrong action this exception will be written in log without stop: but all messages will be processed in the same time also... only log file is growing!

        2010-10-04 18:17:57,649 | INFO | Transport failed: java.io.EOFException | org.apache.activemq.broker.TransportConnection.Transport | ActiveMQ Transport: tcp:///172.16.1.25:24396
        java.io.EOFException
        at java.io.DataInputStream.readByte(DataInputStream.java:243)
        at org.apache.activemq.transport.stomp.StompWireFormat.readLine(StompWireFormat.java:146)
        at org.apache.activemq.transport.stomp.StompWireFormat.parseAction(StompWireFormat.java:162)
        at org.apache.activemq.transport.stomp.StompWireFormat.unmarshal(StompWireFormat.java:92)
        at org.apache.activemq.transport.tcp.TcpTransport.readCommand(TcpTransport.java:226)
        at org.apache.activemq.transport.tcp.TcpTransport.doRun(TcpTransport.java:218)
        at org.apache.activemq.transport.tcp.TcpTransport.run(TcpTransport.java:201)
        at java.lang.Thread.run(Thread.java:595)

        Show
        Sergey Rem added a comment - May be it exists also in 5.4.0 and in 5.4.1. After one wrong action this exception will be written in log without stop: but all messages will be processed in the same time also... only log file is growing! 2010-10-04 18:17:57,649 | INFO | Transport failed: java.io.EOFException | org.apache.activemq.broker.TransportConnection.Transport | ActiveMQ Transport: tcp:///172.16.1.25:24396 java.io.EOFException at java.io.DataInputStream.readByte(DataInputStream.java:243) at org.apache.activemq.transport.stomp.StompWireFormat.readLine(StompWireFormat.java:146) at org.apache.activemq.transport.stomp.StompWireFormat.parseAction(StompWireFormat.java:162) at org.apache.activemq.transport.stomp.StompWireFormat.unmarshal(StompWireFormat.java:92) at org.apache.activemq.transport.tcp.TcpTransport.readCommand(TcpTransport.java:226) at org.apache.activemq.transport.tcp.TcpTransport.doRun(TcpTransport.java:218) at org.apache.activemq.transport.tcp.TcpTransport.run(TcpTransport.java:201) at java.lang.Thread.run(Thread.java:595)
        Jeff Turner made changes -
        Project Import Fri Nov 26 22:32:02 EST 2010 [ 1290828722158 ]
        Hide
        Martin Rehder added a comment -

        It also exists in 5.4.1.

        The exception is constantly printed in the console window where AMQ is started from.

        Show
        Martin Rehder added a comment - It also exists in 5.4.1. The exception is constantly printed in the console window where AMQ is started from.
        Hide
        Tom Purcell added a comment -

        We have been in production with a ruby on rails app using Stomp to put messages in queues in AMQ 5.3.0 for 6 months. We never saw these messages. We are now in the process of testing an upgrade to AMQ 5.4.2 and we started to get the following:
        2011-02-21 13:13:23,464 | DEBUG | Transport failed: java.io.EOFException | org.apache.activemq.broker.TransportConnection.Transport | ActiveMQ Transport: tcp:///172.20.10.127:43880
        java.io.EOFException
        at java.io.DataInputStream.readByte(DataInputStream.java:250)
        at org.apache.activemq.transport.stomp.StompWireFormat.readLine(StompWireFormat.java:146)
        at org.apache.activemq.transport.stomp.StompWireFormat.parseAction(StompWireFormat.java:162)
        at org.apache.activemq.transport.stomp.StompWireFormat.unmarshal(StompWireFormat.java:92)
        at org.apache.activemq.transport.tcp.TcpTransport.readCommand(TcpTransport.java:227)
        at org.apache.activemq.transport.tcp.TcpTransport.doRun(TcpTransport.java:219)
        at org.apache.activemq.transport.tcp.TcpTransport.run(TcpTransport.java:202)
        at java.lang.Thread.run(Thread.java:619)
        2011-02-21 13:13:23,470 | DEBUG | Stopping connection: /172.20.10.127:43880 | org.apache.activemq.broker.TransportConnection | ActiveMQ Task
        2011-02-21 13:13:23,470 | DEBUG | Stopping transport tcp:///172.20.10.127:43880 | org.apache.activemq.transport.tcp.TcpTransport | ActiveMQ Task
        2011-02-21 13:13:23,477 | DEBUG | Stopped transport: /172.20.10.127:43880 | org.apache.activemq.broker.TransportConnection | ActiveMQ Task

        The only change in the environment is the AMQ upgrade.

        In our testing we are pushing load and not seeing any lost messages. I have seen comments like this in various places (http://activemq.2283324.n4.nabble.com/Continuous-Transport-Failures-Moments-Before-STOMP-Connections-Close-tp2952079p2953913.html):

        this log just indicate that connection is lost, it's no indication of
        an error. We recently raised the level of these messages to catch some
        other situations, so they now shows with the INFO level in standard
        logs

        This sounds reasonable but Can I really just ignore this? At any rate it looks like the issue still exists.

        Show
        Tom Purcell added a comment - We have been in production with a ruby on rails app using Stomp to put messages in queues in AMQ 5.3.0 for 6 months. We never saw these messages. We are now in the process of testing an upgrade to AMQ 5.4.2 and we started to get the following: 2011-02-21 13:13:23,464 | DEBUG | Transport failed: java.io.EOFException | org.apache.activemq.broker.TransportConnection.Transport | ActiveMQ Transport: tcp:///172.20.10.127:43880 java.io.EOFException at java.io.DataInputStream.readByte(DataInputStream.java:250) at org.apache.activemq.transport.stomp.StompWireFormat.readLine(StompWireFormat.java:146) at org.apache.activemq.transport.stomp.StompWireFormat.parseAction(StompWireFormat.java:162) at org.apache.activemq.transport.stomp.StompWireFormat.unmarshal(StompWireFormat.java:92) at org.apache.activemq.transport.tcp.TcpTransport.readCommand(TcpTransport.java:227) at org.apache.activemq.transport.tcp.TcpTransport.doRun(TcpTransport.java:219) at org.apache.activemq.transport.tcp.TcpTransport.run(TcpTransport.java:202) at java.lang.Thread.run(Thread.java:619) 2011-02-21 13:13:23,470 | DEBUG | Stopping connection: /172.20.10.127:43880 | org.apache.activemq.broker.TransportConnection | ActiveMQ Task 2011-02-21 13:13:23,470 | DEBUG | Stopping transport tcp:///172.20.10.127:43880 | org.apache.activemq.transport.tcp.TcpTransport | ActiveMQ Task 2011-02-21 13:13:23,477 | DEBUG | Stopped transport: /172.20.10.127:43880 | org.apache.activemq.broker.TransportConnection | ActiveMQ Task The only change in the environment is the AMQ upgrade. In our testing we are pushing load and not seeing any lost messages. I have seen comments like this in various places ( http://activemq.2283324.n4.nabble.com/Continuous-Transport-Failures-Moments-Before-STOMP-Connections-Close-tp2952079p2953913.html): this log just indicate that connection is lost, it's no indication of an error. We recently raised the level of these messages to catch some other situations, so they now shows with the INFO level in standard logs This sounds reasonable but Can I really just ignore this? At any rate it looks like the issue still exists.
        Hide
        Timothy Bish added a comment -

        This is in some ways related to AMQ-3393. There is also the fact that Stomp 1.0 doesn't allow for a keep alive so sometimes connection drops aren't detected. The processing logic for dropped connections has been improved quite a bit so the queued messages should be redelivered to another client if the dropped connection is detected.

        Reopen if the problem persists.

        Show
        Timothy Bish added a comment - This is in some ways related to AMQ-3393 . There is also the fact that Stomp 1.0 doesn't allow for a keep alive so sometimes connection drops aren't detected. The processing logic for dropped connections has been improved quite a bit so the queued messages should be redelivered to another client if the dropped connection is detected. Reopen if the problem persists.
        Timothy Bish made changes -
        Status Open [ 1 ] Closed [ 6 ]
        Resolution Duplicate [ 3 ]

          People

          • Assignee:
            Hiram Chirino
            Reporter:
            Jacob Burkhart
          • Votes:
            8 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development