Uploaded image for project: 'Qpid'
  1. Qpid
  2. QPID-6278

HA broker abort in TXN soak test

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.30
    • Fix Version/s: 0.31
    • Component/s: C++ Clustering
    • Labels:
      None

      Description

      see also https://bugzilla.redhat.com/show_bug.cgi?id=1145386

      I have a repeatable crash in primary HA broker, by doing a soak test on TXNs.

      This is with trunk code new as of an hour ago:

      URL: https://svn.apache.org/repos/asf/qpid/trunk/qpid/cpp
      Repository Root: https://svn.apache.org/repos/asf
      Repository UUID: 13f79535-47bb-0310-9956-ffa450edef68
      Revision: 1626916
      Node Kind: directory
      Schedule: normal
      Last Changed Author: aconway
      Last Changed Rev: 1626887

      I did a standard build, first of proton and then of qpidd – except that I had them install themselves in /usr instead of /usr/local .

      Here are the scripts I use.

      script 1
      starting the HA cluster
      {
      #! /bin/bash

      export PYTHONPATH=/home/mick/trunk/qpid/python

      QPIDD=/usr/sbin/qpidd
      QPID_HA=/home/mick/trunk/qpid/tools/src/py/qpid-ha

      1. This is where I put the log files.
        rm -rf /tmp/mick
        mkdir /tmp/mick

      for N in 1 2 3
      do
      $QPIDD \
      --auth=no \
      --no-module-dir \
      --load-module /usr/lib64/qpid/daemon/ha.so \
      --log-enable debug+:ha:: \
      --ha-cluster=yes \
      --ha-replicate=all \
      --ha-brokers-url=localhost:5801,localhost:5802,localhost:5803 \
      --ha-public-url=localhost:5801,localhost:5802,localhost:5803 \
      -p 580$N \
      --data-dir /tmp/mick/data_$N \
      --log-to-file /tmp/mick/qpidd_$N.log \
      --mgmt-enable=yes \
      -d
      echo "============================================"
      echo "started broker $N from $QPIDD"
      echo "============================================"
      sleep 1
      done

      1. Now promote one broker to primary.
        echo "Promoting broker 5801..."
        ${QPID_HA} promote --cluster-manager -b localhost:5801
        echo "done."

      }

      script 2
      create the tx queues, and load the first one with 1000 messages
      {
      #! /bin/bash

      TXTEST2=/usr/libexec/qpid/tests/qpid-txtest2

      echo "Loading data to queues..."
      ${TXTEST2} --init=yes --transfer=no --check=no \
      --port 5801 \
      --total-messages 1000 --connection-options '

      {reconnect:true}

      ' \
      --messages-per-tx 10 --tx-count 100 \
      --queue-base-name=tx --fetch-timeout=1
      }

      script 3
      now beat the heck out of the TXN code
      {
      #! /bin/bash

      TXTEST2=/usr/libexec/qpid/tests/qpid-txtest2

      echo "starting transfers..."
      ${TXTEST2} --init=no --transfer=yes --check=no \
      --port 5801 \
      --total-messages 5000000 --connection-options '

      {reconnect:true}

      ' \
      --messages-per-tx 10 --tx-count 500000 \
      --queue-base-name=tx --fetch-timeout=1

      }

      I do not do any failovers. Just let that TXN-exercising script run until the primary broker dies.

      It took quite a while. In my most recent test, I got through something like 300,000 transactions (10 messages each) before the broker became brokest.

      I then tried the same test on a standalone broker and it got all the way through.

      Here is the traceback:

      #0 0x0000003186a328a5 in raise () from /lib64/libc.so.6
      #1 0x0000003186a34085 in abort () from /lib64/libc.so.6
      #2 0x0000003186a2ba1e in __assert_fail_base () from /lib64/libc.so.6
      #3 0x0000003186a2bae0 in __assert_fail () from /lib64/libc.so.6
      #4 0x00007f6bb72b4f16 in operator-> (this=0x7f6b48378060, sync=<value optimized out>)
      at /usr/include/boost/smart_ptr/intrusive_ptr.hpp:166
      #5 qpid::broker::SessionState::IncompleteIngressMsgXfer::completed (this=0x7f6b48378060,
      sync=<value optimized out>) at /home/mick/trunk/qpid/cpp/src/qpid/broker/SessionState.cpp:409
      #6 0x00007f6bb726d670 in invokeCallback (this=<value optimized out>, msg=<value optimized out>)
      at /home/mick/trunk/qpid/cpp/src/qpid/broker/AsyncCompletion.h:117
      #7 finishCompleter (this=<value optimized out>, msg=<value optimized out>)
      at /home/mick/trunk/qpid/cpp/src/qpid/broker/AsyncCompletion.h:158
      #8 enqueueComplete (this=<value optimized out>, msg=<value optimized out>)
      at /home/mick/trunk/qpid/cpp/src/qpid/broker/PersistableMessage.h:76
      #9 qpid::broker::NullMessageStore::enqueue (this=<value optimized out>, msg=<value optimized out>)
      at /home/mick/trunk/qpid/cpp/src/qpid/broker/NullMessageStore.cpp:97
      #10 0x00007f6bb71f4578 in qpid::broker::Queue::enqueue (this=0x7f6b4801ef90, ctxt=0x7f6b6821bdf0, msg=...)
      at /home/mick/trunk/qpid/cpp/src/qpid/broker/Queue.cpp:910
      #11 0x00007f6bb71f46db in qpid::broker::Queue::TxPublish::prepare (this=0x7f6b48435c70,
      ctxt=<value optimized out>) at /home/mick/trunk/qpid/cpp/src/qpid/broker/Queue.cpp:159
      #12 0x00007f6bb72c8b3d in qpid::broker::TxBuffer::prepare (this=0x7f6b68549120, ctxt=0x7f6b6821bdf0)
      at /home/mick/trunk/qpid/cpp/src/qpid/broker/TxBuffer.cpp:42
      #13 0x00007f6bb72c9dbe in qpid::broker::TxBuffer::startCommit (this=0x7f6b68549120,
      store=<value optimized out>) at /home/mick/trunk/qpid/cpp/src/qpid/broker/TxBuffer.cpp:73
      #14 0x00007f6bb7298c74 in qpid::broker::SemanticState::commit (this=0x7f6b6c567fb8, store=0x2460440)
      at /home/mick/trunk/qpid/cpp/src/qpid/broker/SemanticState.cpp:198
      #15 0x00007f6bb6c5886e in invoke<qpid::framing::AMQP_ServerOperations::TxHandler> (this=0x7f6b8bffd4a0,
      body=<value optimized out>) at /home/mick/trunk/qpid/cpp/build/src/qpid/framing/TxCommitBody.h:53
      #16 qpid::framing::AMQP_ServerOperations::TxHandler::Invoker::visit (this=0x7f6b8bffd4a0,
      body=<value optimized out>) at /home/mick/trunk/qpid/cpp/build/src/qpid/framing/ServerInvoker.cpp:582
      #17 0x00007f6bb6c5cc41 in qpid::framing::AMQP_ServerOperations::Invoker::visit (this=0x7f6b8bffd670, body=...)
      at /home/mick/trunk/qpid/cpp/build/src/qpid/framing/ServerInvoker.cpp:278
      #18 0x00007f6bb72b504c in invoke<qpid::broker::SessionAdapter> (this=<value optimized out>,
      method=0x7f6b68130790) at /home/mick/trunk/qpid/cpp/src/qpid/framing/Invoker.h:67
      #19 qpid::broker::SessionState::handleCommand (this=<value optimized out>, method=0x7f6b68130790)
      at /home/mick/trunk/qpid/cpp/src/qpid/broker/SessionState.cpp:198
      #20 0x00007f6bb72b6235 in qpid::broker::SessionState::handleIn (this=0x7f6b6c567df0, frame=...)
      at /home/mick/trunk/qpid/cpp/src/qpid/broker/SessionState.cpp:295
      #21 0x00007f6bb6cd5291 in qpid::amqp_0_10::SessionHandler::handleIn (this=0x7f6b6c4e2120, f=...)
      at /home/mick/trunk/qpid/cpp/src/qpid/amqp_0_10/SessionHandler.cpp:93
      #22 0x00007f6bb722692b in operator() (this=0x7f6b500ab840, frame=...)
      at /home/mick/trunk/qpid/cpp/src/qpid/framing/Handler.h:39
      #23 qpid::broker::ConnectionHandler::handle (this=0x7f6b500ab840, frame=...)
      at /home/mick/trunk/qpid/cpp/src/qpid/broker/ConnectionHandler.cpp:94
      #24 0x00007f6bb7221ba8 in qpid::broker::amqp_0_10::Connection::received (this=0x7f6b500ab660, frame=...)
      at /home/mick/trunk/qpid/cpp/src/qpid/broker/amqp_0_10/Connection.cpp:198
      #25 0x00007f6bb71aea4d in qpid::amqp_0_10::Connection::decode (this=0x7f6b5005d770,
      buffer=<value optimized out>, size=<value optimized out>)

        Attachments

        1. ha-tx-race.diff
          2 kB
          Alan Conway

          Activity

            People

            • Assignee:
              aconway Alan Conway
              Reporter:
              aconway Alan Conway
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: