Uploaded image for project: 'Qpid'
  1. Qpid
  2. QPID-7149

[HA] active HA broker memory leak when ring queue discards overflow messages

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • qpid-cpp-1.35.0
    • C++ Broker
    • None

    Description

      There is a memory leak on active HA broker, triggered most probably by purging overflow message from a ring queue. Basic scenario is to setup HA cluster, promote to primary and feed forever a ring queue with messages.

      Detailed scenario:

      1) Start brokers and promote one to primary:

      start_broker() {
      	port=$1
      	shift
      	rm -rf _${port}
      	mkdir _${port}
      	nohup qpidd --load-module=ha.so --port=$port --log-to-file=qpidd.$port.log --data-dir=_${port} --auth=no --log-to-stderr=no --ha-cluster=yes --ha-brokers-url="$(hostname):5672,$(hostname):5673,$(hostname):5674" --ha-replicate=all --acl-file=/root/qpidd.acl "$@" > /dev/null 2>&1 &
      	sleep 1
      }
      
      
      killall qpidd qpid-receive 2> /dev/null
      rm -f qpidd.*.log
      start_broker 5672
      sleep 1
      qpid-ha promote -b $(hostname):5672 --cluster-manager
      sleep 1
      start_broker 5673
      sleep 1
      start_broker 5674
      

      2) Create ring queues and send there messages (it is enough to have 1 queue, having more should show the leak faster):

      for i in $(seq 0 9); do
      	qpid-config add queue FromKeyServer_$i --max-queue-size=10000 --max-queue-count=10 --limit-policy=ring --argument=x-qpid-priorities=10
      done
      
      while true; do
      	for j in $(seq 1 10); do
      		for i in $(seq 1 10); do
      			for k in $(seq 0 9); do
      				qpid-send -a FromKeyServer_$k -m 100 --send-rate=50 -- priority=$(($((RANDOM))%10)) &
      			done
      		done
      		wait
      		while [ $(qpid-stat -q | grep broker-replicator | sed "s/Y//g" | awk '{ print $2 }' | sort -n | tail -n1) != "0" ]; do
      			sleep 1
      		done
      	done
      	date
      	ps aux | grep qpidd | grep "port=5672" | awk -F "--store-dir" '{ print $1 }'
      done
      

      (the "while [ $(qpid-stat -q | .." cycle is there just to slow down the message enqueues to ensure replication federation queues dont have big backlog - that would interfere with memory consumpiton observation)

      3) Run those scripts and monitor memory consumption.

      • without using priority queues and sending messages without priorities, leak is evident as well - sometimes smaller, sometimes the same
      • valgrind (on some older versions I tested before more thoroughly) detects nothing (neither leaked memory or reachable at shutdown)
      • same leak is evident even with --ha-replicate=none
      • number of backup brokers does not affect the memory leak

      Attachments

        Issue Links

          Activity

            People

              aconway Alan Conway
              pmoravec Pavel Moravec
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: