Qpid
  1. Qpid
  2. QPID-3462

Failover is not transparent when using CLIENT_ACK mode

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 0.10, 0.12
    • Fix Version/s: Future
    • Component/s: Java Client
    • Labels:

      Description

      If a session is using CLIENT_ACK mode fails over to another broker, and calls acknowledge on a message it received before failover, the client throws an IllegalStateException.

      The JMS spec states, that the IllegalStateException should only be thrown if the session is closed. When failover happens the JMS session (or the JMS Connection) is not closed, instead we transparently reconnect and create a new AMQP session and allow the application to continue as nothing happens. Therefore throwing the above exception is incorrect.

      We have three options for handling this case.

      1. Throw a JMSException notifying the application that this message was received in the previous session. This notifies the application that the acknowledge didn't succeed and the message is going to be redelivered.

      2. Not throw an exception at all. The application is anyhow prepared to handle duplicates, so this would not be an issue at all. With JMS the last acked message is always in doubt. If the application is using CLIENT_ACK and acknowledging after 'n' messages, then it should be prepared to handle 'n' duplicates in the event of a failover.

      2. The client library can make it totally transparent by not throwing an exception at all.
      Instead it can look up this messages (along with all the other unacked messages upto that point) in it's dispatch queue received after failover. The messages can be identified using the message ID (and they will also be marked re-delivered by the broker).

      It can then call acknowledge on these messages and remove them from the dispatch queue. i.e they will not be redelivered to the application at all.

      What do you think is the best option?

      Regards,

      Rajith

        Activity

        Rajith Attapattu created issue -
        Hide
        Rajith Attapattu added a comment - - edited

        When considering Failover with respect to CLIENT_ACKNOWLEDGE we need to
        consider the following 3 cases,

        1. The last acknowledgement is in doubt after failover.

        Message m = consumer.receive();
        m.acknowledge();
        // client fails over due to broker crash
        
        Message m2 = consumer.receive();
        m2.acknowledge();
        

        2. Calling acknowledge() on a message received prior to failover.

        Message m = consumer.receive();
        
        // The application does some work while the JMS client fails over.
        
        m.acknowledge();
        

        3. Calling acknowledge on a message received after failover, but implicitly
        acknowledging messages received prior to failover.

        Message m = consumer.receive();
        // client fails over due to broker crash
        
        Message m2 = consumer.receive();
        m2.acknowledge();
        

        In the first case,
        If m.acknowlege() returns the JMS client lib needs to ensure that any messages
        upto that point will not be replayed. So the acknowlege() method needs to be
        synchronous (in 0-10 terms call sync()).

        If it doesn't return before failover, then it means the application needs to be
        ready to handle duplicates for all unacked messages.
        The acknowledge method needs to throw an exception to signal the app that it
        failed.
        The JMS client + broker will redeliver all unacked messages in the same order.

        In the second case, the client needs to throw a JMS exception.
        And the next time the application tries to receive a message it will be the one
        after the oldest acked messages.

        In the 3rd case, the JMS client lib needs to ensure, that "m" (the last message
        before failover) is redelivered again.
        i.e. 'm2' is the same as 'm'.

        And when acknowledge is called on m2, the JMS client lib should throw a JMS
        exception.

        And the next time the application tries to receive a message it will be the one
        after the oldest acked message.

        Simply calling recover will not provide a sufficient solution to this problem.
        Besides there are subtle issues with the current recover implementation within
        a JCA contexnt.

        Therefore we need to carefully consider all options before attempting a fix.
        The changes required needs to be carefully tested and evaluated.

        I recommend we de-scope this JIRA from MRG 2.0.3 errata.

        Show
        Rajith Attapattu added a comment - - edited When considering Failover with respect to CLIENT_ACKNOWLEDGE we need to consider the following 3 cases, 1. The last acknowledgement is in doubt after failover. Message m = consumer.receive(); m.acknowledge(); // client fails over due to broker crash Message m2 = consumer.receive(); m2.acknowledge(); 2. Calling acknowledge() on a message received prior to failover. Message m = consumer.receive(); // The application does some work while the JMS client fails over. m.acknowledge(); 3. Calling acknowledge on a message received after failover, but implicitly acknowledging messages received prior to failover. Message m = consumer.receive(); // client fails over due to broker crash Message m2 = consumer.receive(); m2.acknowledge(); In the first case, If m.acknowlege() returns the JMS client lib needs to ensure that any messages upto that point will not be replayed. So the acknowlege() method needs to be synchronous (in 0-10 terms call sync()). If it doesn't return before failover, then it means the application needs to be ready to handle duplicates for all unacked messages. The acknowledge method needs to throw an exception to signal the app that it failed. The JMS client + broker will redeliver all unacked messages in the same order. In the second case, the client needs to throw a JMS exception. And the next time the application tries to receive a message it will be the one after the oldest acked messages. In the 3rd case, the JMS client lib needs to ensure, that "m" (the last message before failover) is redelivered again. i.e. 'm2' is the same as 'm'. And when acknowledge is called on m2, the JMS client lib should throw a JMS exception. And the next time the application tries to receive a message it will be the one after the oldest acked message. Simply calling recover will not provide a sufficient solution to this problem. Besides there are subtle issues with the current recover implementation within a JCA contexnt. Therefore we need to carefully consider all options before attempting a fix. The changes required needs to be carefully tested and evaluated. I recommend we de-scope this JIRA from MRG 2.0.3 errata.
        Rajith Attapattu made changes -
        Field Original Value New Value
        Fix Version/s 0.15 [ 12319043 ]
        Fix Version/s 0.14 [ 12316855 ]
        Rob Godfrey made changes -
        Labels failover
        Hide
        Robbie Gemmell added a comment -

        Status of this JIRA is unclear, but significant changes were made to CLIENT_ACK handling last year, e.g https://issues.apache.org/jira/browse/QPID-3526

        Show
        Robbie Gemmell added a comment - Status of this JIRA is unclear, but significant changes were made to CLIENT_ACK handling last year, e.g https://issues.apache.org/jira/browse/QPID-3526
        Rajith Attapattu made changes -
        Fix Version/s Future [ 12315490 ]
        Fix Version/s 0.15 [ 12319043 ]

          People

          • Assignee:
            Rajith Attapattu
            Reporter:
            Rajith Attapattu
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:

              Development