Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-2451

mesos c++ zookeeper code hangs from api operation from within watcher of CHANGE event

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 0.22.0
    • None
    • c++ api
    • None
    • red hat linux 6.5

    Description

      We've observed that that the mesos 0.22.0-rc1 c++ zookeeper code appears to hang (two threads stuck in indefinite pthread condition waits) on a test case that as best we can tell is mesos issue and not issue with underlying apache zookeeper C binding.
      (that is we tried same type case using apache zookeeper C binding directly and saw no issues.)
      This happens with a properly running zookeeper (standalone is sufficient).

      Heres how we hung it:
      We issue a mesos zk set via

      int ZooKeeper::set ( const std::string & path,
      const std::string & data,
      int version
      )

      then inside a Watcher we process on CHANGED event to issue a mesos zk get on
      the same path via

      int ZooKeeper::get ( const std::string & path,
      bool watch,
      std::string * result,
      Stat * stat
      )

      we end up with two threads in the process both in pthread_cond_waits
      #0 0x000000334e20b43c in pthread_cond_wait@@GLIBC_2.3.2 () from
      /lib64/libpthread.so.0
      #1 0x00007f6664ee1cf5 in Gate::arrive (this=0x7f6140, old=0)
      at ../../../3rdparty/libprocess/src/gate.hpp:82
      #2 0x00007f6664ecef6e in process::ProcessManager::wait (this=0x7f02e0, pid=...)
      at ../../../3rdparty/libprocess/src/process.cpp:2476
      #3 0x00007f6664ed2ce9 in process::wait (pid=..., duration=...)
      at ../../../3rdparty/libprocess/src/process.cpp:2958
      #4 0x00007f6664e90558 in process::Latch::await (this=0x7f6ba0, duration=...)
      at ../../../3rdparty/libprocess/src/latch.cpp:49
      #5 0x00007f66649452cc in process::Future<int>::await (this=0x7fffa0fd9040,
      duration=...)
      at ../../3rdparty/libprocess/include/process/future.hpp:1156
      #6 0x00007f666493a04d in process::Future<int>::get (this=0x7fffa0fd9040)
      at ../../3rdparty/libprocess/include/process/future.hpp:1167
      #7 0x00007f6664ab1aac in ZooKeeper::set (this=0x803ce0, path="/craig/mo", data=
      ...

      and
      #0 0x000000334e20b43c in pthread_cond_wait@@GLIBC_2.3.2 () from
      /lib64/libpthread.so.0
      #1 0x00007f6664ee1cf5 in Gate::arrive (this=0x7f66380013f0, old=0)
      at ../../../3rdparty/libprocess/src/gate.hpp:82
      #2 0x00007f6664ecef6e in process::ProcessManager::wait (this=0x7f02e0, pid=...)
      at ../../../3rdparty/libprocess/src/process.cpp:2476
      #3 0x00007f6664ed2ce9 in process::wait (pid=..., duration=...)
      at ../../../3rdparty/libprocess/src/process.cpp:2958
      #4 0x00007f6664e90558 in process::Latch::await (this=0x7f6638000d00,
      duration=...)
      at ../../../3rdparty/libprocess/src/latch.cpp:49
      #5 0x00007f66649452cc in process::Future<int>::await (this=0x7f66595fb6f0,
      duration=...)
      at ../../3rdparty/libprocess/include/process/future.hpp:1156
      #6 0x00007f666493a04d in process::Future<int>::get (this=0x7f66595fb6f0)
      at ../../3rdparty/libprocess/include/process/future.hpp:1167
      #7 0x00007f6664ab18d3 in ZooKeeper::get (this=0x803ce0, path="/craig/mo",
      watch=false,
      ....

      We of course have a separate "enhancement" suggestion that the mesos C++ zookeeper api use timed waits and not block indefinitely for responses.
      But this case we think the mesos code itself is blocking on itself and not handling the responses.

      craig

      Attachments

        1. bug.cpp
          6 kB
          craig bordelon
        2. log.h
          3 kB
          craig bordelon
        3. Makefile
          1.0 kB
          craig bordelon
        4. bug0.cpp
          11 kB
          craig bordelon

        Issue Links

          Activity

            People

              Unassigned Unassigned
              bord craig bordelon
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: