Qpid
  1. Qpid
  2. QPID-2523

When reloading a large acl file while acl lookup is in progress, the broker core dumps

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: M4, 0.5, 0.6
    • Fix Version/s: 0.7
    • Component/s: C++ Broker
    • Labels:
      None

      Description

      Description of problem:
      When reloading a large acl file , the broker core dumps.
      This surfaced while running the attached reproducer.

      How reproducible:
      Always - reproducer attached.

      Steps to Reproduce:
      1. Start the broker with the acl module and and --acl-file /tmp/policy.acl
      2. The initial acl file should contain only "acl allow all all"
      3. run the message_sender.py (This program will keep on publishing to
      amq.direct)
      4. run acl_reloader.py with --mode allow | deny a few times

      Actual results:
      The broker core dumps.

      Expected results:
      The broker should continue to work after reloading the acl file properly.

      Additional info:

      Following is the backtrace from the code dump.

      1.
      (gdb) bt
      2.
      #0 0x00cbe422 in __kernel_vsyscall ()
      3.
      #1 0x00183781 in raise () from /lib/libc.so.6
      4.
      #2 0x0018504a in abort () from /lib/libc.so.6
      5.
      #3 0x001c1619 in __libc_message () from /lib/libc.so.6
      6.
      #4 0x001c7a71 in malloc_printerr () from /lib/libc.so.6
      7.
      #5 0x001ca363 in munmap_chunk () from /lib/libc.so.6
      8.
      #6 0x040a1681 in operator delete(void*) () from /usr/lib/libstdc++.so.6
      9.
      #7 0x0035243e in qpid::acl::AclData::clear (this=0x8221328) at
      qpid/acl/AclData.cpp:40
      10.
      #8 0x003524ad in qpid::acl::AclData::~AclData (this=0x8221328,
      __in_chrg=<value optimized out>) at qpid/acl/AclData.cpp:259
      11.
      #9 0x003515a8 in checked_delete<qpid::acl::AclData> (x=<value optimized
      out>) at /usr/include/boost/checked_delete.hpp:34
      12.
      #10 boost::detail::sp_counted_impl_p<qpid::acl::AclData>::dispose
      (x=<value optimized out>) at /usr/include/boost/detail/sp_counted_impl.hpp:78
      13.
      #11 0x0034e50b in boost::detail::sp_counted_base::release (this=<value
      optimized out>) at /usr/include/boost/detail/sp_counted_base_gcc_x86.hpp:145
      14.
      #12 ~shared_count (this=<value optimized out>) at
      /usr/include/boost/detail/shared_count.hpp:216
      15.
      #13 ~shared_ptr (this=<value optimized out>) at
      /usr/include/boost/shared_ptr.hpp:165
      16.
      #14 qpid::acl::Acl::authorise (this=<value optimized out>) at
      qpid/acl/Acl.cpp:86
      17.
      #15 0x00add720 in qpid::broker::SemanticState::route (this=0x82218a0,
      msg=

      {p_ = 0xb5644868}, strategy=@0xb61fe178)
      18.
      at qpid/broker/SemanticState.cpp:447
      19.
      #16 0x00ade215 in qpid::broker::SemanticState::handle (this=0x82218a0,
      msg={p_ = 0xb5644868}

      ) at qpid/broker/SemanticState.cpp:415
      20.
      ............

      1. QPID-2523-reproducer.tar.gz
        1 kB
        Rajith Attapattu
      2. QPID-2523.patch
        3 kB
        Rajith Attapattu
      3. valgrind-output
        4 kB
        Rajith Attapattu

        Activity

        Hide
        Rajith Attapattu added a comment -

        This issue is likely to happen with a sufficiently large acl file. (~ 1000+ entries)
        With a smaller file (~100 entries) it takes a few iterations to happen.

        Show
        Rajith Attapattu added a comment - This issue is likely to happen with a sufficiently large acl file. (~ 1000+ entries) With a smaller file (~100 entries) it takes a few iterations to happen.
        Hide
        Rajith Attapattu added a comment -

        After some investigation (see the valgrind output) it seems the issue was related to,
        line 70 & 81 in Acl.cpp, (rev 936028)
        boost::shared_ptr<AclData> dataLocal = data; //rcu copy

        and line 136 in Acl.cpp, (rev 936028)
        data = d;

        Consulting the boost documentation, it seems this is not thread safe. (See example #3)
        http://www.boost.org/doc/libs/1_42_0/libs/smart_ptr/shared_ptr.htm#ThreadSafety

        The fix was to add a lock around the assignments to ensure they are atomic.

        Show
        Rajith Attapattu added a comment - After some investigation (see the valgrind output) it seems the issue was related to, line 70 & 81 in Acl.cpp, (rev 936028) boost::shared_ptr<AclData> dataLocal = data; //rcu copy and line 136 in Acl.cpp, (rev 936028) data = d; Consulting the boost documentation, it seems this is not thread safe. (See example #3) http://www.boost.org/doc/libs/1_42_0/libs/smart_ptr/shared_ptr.htm#ThreadSafety The fix was to add a lock around the assignments to ensure they are atomic.
        Hide
        Rajith Attapattu added a comment -

        Fixed and tested manually using the attached reproducer.

        Show
        Rajith Attapattu added a comment - Fixed and tested manually using the attached reproducer.

          People

          • Assignee:
            Rajith Attapattu
            Reporter:
            Rajith Attapattu
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development