Uploaded image for project: 'Qpid'
  1. Qpid
  2. QPID-6157

linearstore: segfault when 2 journals request new journal file from empty EFP

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.30
    • 0.31
    • C++ Broker

    Description

      Description of problem:
      Broker using linearstore module can segfault when:

      • EFP is empty
      • 2 journals concurrently request new journal file from EFP

      There is a race condition described in Additional info that leads to segfault.

      Version-Release number of selected component (if applicable):
      any

      How reproducible:
      100% in few minutes (on faster machines)

      Steps to Reproduce:
      Reproducer script:

      topics=10
      queues_per_topic=10

      rm -rf /var/lib/qpidd/* /tmp/qpidd.log
      service qpidd restart

      echo "$(date): creating $(($((topics))*$((queues_per_topic)))) queues"
      for i in $(seq 1 $topics); do
      for j in $(seq 1 $queues_per_topic); do
      qpid-receive -a "Durable_${i}_${j}; {create:always, node:{durable:true, x-bindings:[{exchange:'amq.direct', queue:'Durable_${i}_${j}', key:'${i}'}] }}" &
      done
      done
      wait

      echo "$(date): queues created"
      while true; do
      echo "$(date): publishing messages.."
      for i in $(seq 1 $topics); do
      qpid-send -a "amq.direct/${i}" -m 1000000 --durable=yes --content-size=1000 &
      done
      wait
      echo "$(date): consuming messages.."
      for i in $(seq 1 $topics); do
      for j in $(seq 1 $queues_per_topic); do
      qpid-receive -a "Durable_${i}_${j}" -m 1000000 --print-content=no &
      done
      done
      wait
      done

      #end of the script

      Actual results:
      segfault with bt:

      Thread 1 (Thread 0x7ff85b3f1700 (LWP 17810)):
      #0 0x00007ff9927104f3 in std::basic_string<char, std::char_traits<char>, std::allocator<char> >::assign(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () from /usr/lib64/libstdc++.so.6
      No symbol table info available.
      #1 0x00007ff98e59d6a1 in operator= (this=0x1ab3480) at /usr/include/c++/4.4.7/bits/basic_string.h:511
      No locals.
      #2 qpid::linearstore::journal::EmptyFilePool::popEmptyFile (this=0x1ab3480)
      at /usr/src/debug/qpid-0.22/cpp/src/qpid/linearstore/journal/EmptyFilePool.cpp:213
      l = {_sm = @0x1ab34f8}
      emptyFileName = ""
      isEmpty = true
      #3 0x00007ff98e59ddec in qpid::linearstore::journal::EmptyFilePool::takeEmptyFile (this=0x1ab3480, destDirectory=
      "/var/lib/qpidd/qls/jrnl/DurableQueue")
      at /usr/src/debug/qpid-0.22/cpp/src/qpid/linearstore/journal/EmptyFilePool.cpp:108
      emptyFileName = ""
      newFileName = ""

      Expected results:
      no segfault

      Additional info:
      Relevant source code:

      std::string EmptyFilePool::popEmptyFile() {
      std::string emptyFileName;
      bool isEmpty = false;

      { slock l(emptyFileListMutex_); isEmpty = emptyFileList_.empty(); }

      if (isEmpty)

      { createEmptyFile(); } { slock l(emptyFileListMutex_); emptyFileName = emptyFileList_.front(); <-- line 213 emptyFileList_.pop_front(); }

      return emptyFileName;
      }

      If two requests (R1 and R2) are made concurrently when EFP is empty such that:

      • R1 performs most of the function until line 212 (second lock)
      • this means creating one empty file
      • R2 performs the same - but now EFP has one file so no new file to be created
      • R1 (or R2, it does not matter) continues on line 212 and further
      • so it takes the empty file
      • the second request tries to take an empty file from the empty EFP and triggers the segfault

      Attachments

        Activity

          People

            pmoravec Pavel Moravec
            pmoravec Pavel Moravec
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: