ActiveMQ
  1. ActiveMQ
  2. AMQ-1837

Lost messages after broker recovery

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Critical Critical
    • Resolution: Duplicate
    • Affects Version/s: 5.1.0
    • Fix Version/s: 5.3.0
    • Component/s: Message Store
    • Labels:
      None

      Description

      We experienced that some messages (send in persistent mode) were lost after the message broker was killed and restarted.
      The attached little test simulates the situation:
      There are 10 queues, 10 sender and 10 receiver threads each sending / receiving quite frequently to / from one queue for a quite long time.
      The receiver for each queue expects the messages exactly in the order the sender has send the messages.
      After restarting (killing) the broker the order of the received messages gets disturbed or messages get lost.

      • Start the broker as configures by default in the distribution (only switch on JMX support).
      • Start the main method of the attached class
      • Wait until some messages are received
      • kill or shutdown the broker
      • restart the broker
      • have a look at the output of the test class
        Output
        Exception in ReceiverThread: java.lang.RuntimeException: Received unexpected message! Expected: QUEUE-4: #3197, Received: QUEUE-4: #3267
        
      Broker Output
      ...
      INFO  BrokerService                  - Using Persistence Adapter: AMQPersistenceAdapter(D:\sandbox\Experimental\Activemq\apache-activemq-5.2-SNAPSHOT\bin\..\data)
      INFO  AMQPersistenceAdapter          - AMQStore starting using directory: D:\sandbox\Experimental\Activemq\apache-activemq-5.2-SNAPSHOT\bin\..\data
      INFO  KahaStore                      - Kaha Store using data directory D:\sandbox\Experimental\Activemq\apache-activemq-5.2-SNAPSHOT\bin\..\data\kr-store\state
      INFO  AMQPersistenceAdapter          - Active data files: []
      WARN  AMQPersistenceAdapter          - The ReferenceStore is not valid - recovering ...
      INFO  KahaStore                      - Kaha Store successfully deleted data directory D:\sandbox\Experimental\Activemq\apache-activemq-5.2-SNAPSHOT\bin\..\data\kr-store\data
      INFO  AMQPersistenceAdapter          - Journal Recovery Started from: DataManager:(data-)
      INFO  KahaStore                      - Kaha Store using data directory D:\sandbox\Experimental\Activemq\apache-activemq-5.2-SNAPSHOT\bin\..\data\kr-store\data
      INFO  AMQPersistenceAdapter          - Recovered 198261 operations from redo log in 18.028 seconds.
      INFO  AMQPersistenceAdapter          - Finished recovering the ReferenceStore
      INFO  BrokerService                  - ActiveMQ 5.2-SNAPSHOT JMS Message Broker (localhost) is starting
      ...
      
      1. ASF.LICENSE.NOT.GRANTED--src.zip
        2 kB
        Jan Wiemer
      2. MultithreadedQueueTest2.zip
        4 kB
        Helmut Janknecht

        Activity

        Hide
        Jan Wiemer added a comment -

        By the way I used: apache-activemq-5.2-20080619.135742-2-bin.zip

        Show
        Jan Wiemer added a comment - By the way I used: apache-activemq-5.2-20080619.135742-2-bin.zip
        Hide
        Jan Wiemer added a comment -

        Note that the problem seems not to occur when working with a single queue (and one sender thread and one receiver thread).

        Show
        Jan Wiemer added a comment - Note that the problem seems not to occur when working with a single queue (and one sender thread and one receiver thread).
        Hide
        Helmut Janknecht added a comment -

        I changed code a little bit and do the receive with a 120s timeout in order to avoid race conditions there.
        Addtionally I now sent object messages with approx. 2K size. With prefetch size set to default it fails often after Kaha recovery. With low prefetch size 5 or 10 failure rate is lower but not 0...

        Show
        Helmut Janknecht added a comment - I changed code a little bit and do the receive with a 120s timeout in order to avoid race conditions there. Addtionally I now sent object messages with approx. 2K size. With prefetch size set to default it fails often after Kaha recovery. With low prefetch size 5 or 10 failure rate is lower but not 0...
        Hide
        Helmut Janknecht added a comment - - edited

        One example log of a failure, here actually without a restart:

        QUEUE-2 currently sent message #10000
        QUEUE-8 currently received message #9000
        QUEUE-4 currently sent message #10000
        QUEUE-5 currently sent message #10000
        QUEUE-9 currently sent message #10000
        QUEUE-0 currently received message #10000
        QUEUE-3 currently sent message #10000
        QUEUE-4 currently received message #10000
        QUEUE-2 currently received message #10000
        Unexpected message from QUEUE-8! Expected: QUEUE-8: #9342, Received: QUEUE-8: #9343
        

        It skipped one message #9342.
        Browsing QUEUE-8 with JMX showed:

        ----------------------------------------------------------------------
        ID:dono-janknecht-4560-1215150766850-0:6:1:1:9344: QUEUE-8: #9344
        ID:dono-janknecht-4560-1215150766850-0:6:1:1:9345: QUEUE-8: #9345
        ID:dono-janknecht-4560-1215150766850-0:6:1:1:9346: QUEUE-8: #9346
        ID:dono-janknecht-4560-1215150766850-0:6:1:1:9347: QUEUE-8: #9347
        ID:dono-janknecht-4560-1215150766850-0:6:1:1:9348: QUEUE-8: #9348
        ID:dono-janknecht-4560-1215150766850-0:6:1:1:9350: QUEUE-8: #9350
        ...
        ID:dono-janknecht-4560-1215150766850-0:6:1:1:9474: QUEUE-8: #9474
        ID:dono-janknecht-4560-1215150766850-0:6:1:1:9475: QUEUE-8: #9475
        ----------------------------------------------------------------------
        

        So QUEUE-8 does not contain this element.

        Then I restarted the broker and now surprisingly the missing "QUEUE-8: #9342" appears, the "JMSRedelivered" is set to false.

        So it seems that under heavy load in multi threaded scenarios persistent messages somtimes stuck.
        With DeliveryMode.NON_PERSISTENT it seems to work.

        Show
        Helmut Janknecht added a comment - - edited One example log of a failure, here actually without a restart: QUEUE-2 currently sent message #10000 QUEUE-8 currently received message #9000 QUEUE-4 currently sent message #10000 QUEUE-5 currently sent message #10000 QUEUE-9 currently sent message #10000 QUEUE-0 currently received message #10000 QUEUE-3 currently sent message #10000 QUEUE-4 currently received message #10000 QUEUE-2 currently received message #10000 Unexpected message from QUEUE-8! Expected: QUEUE-8: #9342, Received: QUEUE-8: #9343 It skipped one message #9342. Browsing QUEUE-8 with JMX showed: ---------------------------------------------------------------------- ID:dono-janknecht-4560-1215150766850-0:6:1:1:9344: QUEUE-8: #9344 ID:dono-janknecht-4560-1215150766850-0:6:1:1:9345: QUEUE-8: #9345 ID:dono-janknecht-4560-1215150766850-0:6:1:1:9346: QUEUE-8: #9346 ID:dono-janknecht-4560-1215150766850-0:6:1:1:9347: QUEUE-8: #9347 ID:dono-janknecht-4560-1215150766850-0:6:1:1:9348: QUEUE-8: #9348 ID:dono-janknecht-4560-1215150766850-0:6:1:1:9350: QUEUE-8: #9350 ... ID:dono-janknecht-4560-1215150766850-0:6:1:1:9474: QUEUE-8: #9474 ID:dono-janknecht-4560-1215150766850-0:6:1:1:9475: QUEUE-8: #9475 ---------------------------------------------------------------------- So QUEUE-8 does not contain this element. Then I restarted the broker and now surprisingly the missing "QUEUE-8: #9342" appears, the "JMSRedelivered" is set to false. So it seems that under heavy load in multi threaded scenarios persistent messages somtimes stuck. With DeliveryMode.NON_PERSISTENT it seems to work.
        Hide
        Helmut Janknecht added a comment -

        Any news here about this issue?
        I retried with latest apache-activemq-5.2-20080713.231541-8 and same problem: after a few seconds one of receiver thread skips a message:

        ...
        Q2 currently received: Q2: #2000
        Q0 currently received: Q0: #2000
        Unexpected message from Q4! Expected: Q4: #2155, Received: Q4: #2156 (JMSRedelivered=false)
        Unexpected message from Q3! Expected: Q3: #2177, Received: Q3: #2178 (JMSRedelivered=false)
        ----------------------------------------------------------------------
        ID:dono-janknecht-3517-1216026084546-0:0:1:1:2158: Q4: #2158
        ID:dono-janknecht-3517-1216026084546-0:0:1:1:2160: Q4: #2160
        ID:dono-janknecht-3517-1216026084546-0:0:1:1:2161: Q4: #2161
        ID:dono-janknecht-3517-1216026084546-0:0:1:1:2164: Q4: #2164
        ID:dono-janknecht-3517-1216026084546-0:0:1:1:2166: Q4: #2166
        ...
        

        And this again without any restart or so but with PERSISTENT messages.
        Here it skipped #2155 in Q4 and JMX browsing shows that also #2159 is missing.
        If you then stop broker normally and restart these two messages are there again!?

        I tried same scenario for example with Sun's OpenMQ 4.2 and there it works.
        For us this is a real show stopper because we rely on strict queue message ordering...

        Show
        Helmut Janknecht added a comment - Any news here about this issue? I retried with latest apache-activemq-5.2-20080713.231541-8 and same problem: after a few seconds one of receiver thread skips a message: ... Q2 currently received: Q2: #2000 Q0 currently received: Q0: #2000 Unexpected message from Q4! Expected: Q4: #2155, Received: Q4: #2156 (JMSRedelivered=false) Unexpected message from Q3! Expected: Q3: #2177, Received: Q3: #2178 (JMSRedelivered=false) ---------------------------------------------------------------------- ID:dono-janknecht-3517-1216026084546-0:0:1:1:2158: Q4: #2158 ID:dono-janknecht-3517-1216026084546-0:0:1:1:2160: Q4: #2160 ID:dono-janknecht-3517-1216026084546-0:0:1:1:2161: Q4: #2161 ID:dono-janknecht-3517-1216026084546-0:0:1:1:2164: Q4: #2164 ID:dono-janknecht-3517-1216026084546-0:0:1:1:2166: Q4: #2166 ... And this again without any restart or so but with PERSISTENT messages. Here it skipped #2155 in Q4 and JMX browsing shows that also #2159 is missing. If you then stop broker normally and restart these two messages are there again!? I tried same scenario for example with Sun's OpenMQ 4.2 and there it works. For us this is a real show stopper because we rely on strict queue message ordering...
        Hide
        Gary Tully added a comment -
        Show
        Gary Tully added a comment - this should be resolved on trunk by: http://svn.apache.org/viewvc?rev=758678&view=rev Related issue: https://issues.apache.org/activemq/browse/AMQ-2149

          People

          • Assignee:
            Unassigned
            Reporter:
            Jan Wiemer
          • Votes:
            2 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development