Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-2451

mesos c++ zookeeper code hangs from api operation from within watcher of CHANGE event

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 0.22.0
    • Fix Version/s: None
    • Component/s: c++ api
    • Labels:
      None
    • Environment:

      red hat linux 6.5

      Description

      We've observed that that the mesos 0.22.0-rc1 c++ zookeeper code appears to hang (two threads stuck in indefinite pthread condition waits) on a test case that as best we can tell is mesos issue and not issue with underlying apache zookeeper C binding.
      (that is we tried same type case using apache zookeeper C binding directly and saw no issues.)
      This happens with a properly running zookeeper (standalone is sufficient).

      Heres how we hung it:
      We issue a mesos zk set via

      int ZooKeeper::set ( const std::string & path,
      const std::string & data,
      int version
      )

      then inside a Watcher we process on CHANGED event to issue a mesos zk get on
      the same path via

      int ZooKeeper::get ( const std::string & path,
      bool watch,
      std::string * result,
      Stat * stat
      )

      we end up with two threads in the process both in pthread_cond_waits
      #0 0x000000334e20b43c in pthread_cond_wait@@GLIBC_2.3.2 () from
      /lib64/libpthread.so.0
      #1 0x00007f6664ee1cf5 in Gate::arrive (this=0x7f6140, old=0)
      at ../../../3rdparty/libprocess/src/gate.hpp:82
      #2 0x00007f6664ecef6e in process::ProcessManager::wait (this=0x7f02e0, pid=...)
      at ../../../3rdparty/libprocess/src/process.cpp:2476
      #3 0x00007f6664ed2ce9 in process::wait (pid=..., duration=...)
      at ../../../3rdparty/libprocess/src/process.cpp:2958
      #4 0x00007f6664e90558 in process::Latch::await (this=0x7f6ba0, duration=...)
      at ../../../3rdparty/libprocess/src/latch.cpp:49
      #5 0x00007f66649452cc in process::Future<int>::await (this=0x7fffa0fd9040,
      duration=...)
      at ../../3rdparty/libprocess/include/process/future.hpp:1156
      #6 0x00007f666493a04d in process::Future<int>::get (this=0x7fffa0fd9040)
      at ../../3rdparty/libprocess/include/process/future.hpp:1167
      #7 0x00007f6664ab1aac in ZooKeeper::set (this=0x803ce0, path="/craig/mo", data=
      ...

      and
      #0 0x000000334e20b43c in pthread_cond_wait@@GLIBC_2.3.2 () from
      /lib64/libpthread.so.0
      #1 0x00007f6664ee1cf5 in Gate::arrive (this=0x7f66380013f0, old=0)
      at ../../../3rdparty/libprocess/src/gate.hpp:82
      #2 0x00007f6664ecef6e in process::ProcessManager::wait (this=0x7f02e0, pid=...)
      at ../../../3rdparty/libprocess/src/process.cpp:2476
      #3 0x00007f6664ed2ce9 in process::wait (pid=..., duration=...)
      at ../../../3rdparty/libprocess/src/process.cpp:2958
      #4 0x00007f6664e90558 in process::Latch::await (this=0x7f6638000d00,
      duration=...)
      at ../../../3rdparty/libprocess/src/latch.cpp:49
      #5 0x00007f66649452cc in process::Future<int>::await (this=0x7f66595fb6f0,
      duration=...)
      at ../../3rdparty/libprocess/include/process/future.hpp:1156
      #6 0x00007f666493a04d in process::Future<int>::get (this=0x7f66595fb6f0)
      at ../../3rdparty/libprocess/include/process/future.hpp:1167
      #7 0x00007f6664ab18d3 in ZooKeeper::get (this=0x803ce0, path="/craig/mo",
      watch=false,
      ....

      We of course have a separate "enhancement" suggestion that the mesos C++ zookeeper api use timed waits and not block indefinitely for responses.
      But this case we think the mesos code itself is blocking on itself and not handling the responses.

      craig

        Attachments

        1. bug.cpp
          6 kB
          craig bordelon
        2. log.h
          3 kB
          craig bordelon
        3. Makefile
          1.0 kB
          craig bordelon
        4. bug0.cpp
          11 kB
          craig bordelon

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                bord craig bordelon
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated: