Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-36

race condition on MesosExecutorDriver destruction for Python

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • None
    • None
    • None
    • OS X 10.6.8, Python 2.6.7

    Description

      There's a race condition between driver.stop/abort and the destruction of the Python driver object.

      For example:

      def main(args, options):
      thermos_executor = ThermosExecutor(options)
      mesos.MesosExecutorDriver(thermos_executor).run()

      will routinely cause the segfault as attached at the end of the thread.

      If this is changed to
      def main(args, options):
      thermos_executor = ThermosExecutor(options)
      drv = mesos.MesosExecutorDriver(thermos_executor)
      drv.run()

      The code works, which indicates an issue (as you can see in the stack trace below) in the implicit reference counting on the MesosExecutorDriver object.
      The launchTask in my example does some work, sends a framework message when the task is finished, then issues a driver.stop().

      Process: Python [64054]
      Path: /Users/wickman/Local/Library/Frameworks/Python.framework/Versions/2.6/Resources/Python.app/Contents/MacOS/Python
      Identifier: Python
      Version: ??? (???)
      Code Type: X86-64 (Native)
      Parent Process: java [64016]

      Date/Time: 2011-10-07 14:30:01.103 -0700
      OS Version: Mac OS X 10.6.8 (10K549)
      Report Version: 6

      Interval Since Last Report: 944086 sec
      Crashes Since Last Report: 18
      Per-App Crashes Since Last Report: 12
      Anonymous UUID: 3A043B60-6C64-4A96-BA3A-C04C21BA960E

      Exception Type: EXC_CRASH (SIGABRT)
      Exception Codes: 0x0000000000000000, 0x0000000000000000
      Crashed Thread: 1

      Application Specific Information:
      abort() called

      Thread 0: Dispatch queue: com.apple.main-thread
      0 libSystem.B.dylib 0x00007fff869bba6a __semwait_signal + 10
      1 libSystem.B.dylib 0x00007fff869bf881 _pthread_cond_wait + 1286
      2 org.python.python 0x00000001000ef554 PyThread_acquire_lock + 116
      3 org.python.python 0x00000001000b499a PyEval_RestoreThread + 58
      4 org.python.python 0x00000001000f3f20 lock_PyThread_acquire_lock + 80
      5 org.python.python 0x00000001000bc558 PyEval_EvalFrameEx + 28696
      6 org.python.python 0x00000001000bd305 PyEval_EvalCodeEx + 2197
      7 org.python.python 0x00000001000bb29d PyEval_EvalFrameEx + 23901
      8 org.python.python 0x00000001000bb6fa PyEval_EvalFrameEx + 25018
      9 org.python.python 0x00000001000bb6fa PyEval_EvalFrameEx + 25018
      10 org.python.python 0x00000001000bd305 PyEval_EvalCodeEx + 2197
      11 org.python.python 0x000000010003b50d function_call + 429
      12 org.python.python 0x000000010000bb92 PyObject_Call + 98
      13 org.python.python 0x00000001000b76d2 PyEval_EvalFrameEx + 8594
      14 org.python.python 0x00000001000bd305 PyEval_EvalCodeEx + 2197
      15 org.python.python 0x000000010003b405 function_call + 165
      16 org.python.python 0x000000010000bb92 PyObject_Call + 98
      17 org.python.python 0x00000001000b43c7 PyEval_CallObjectWithKeywords + 87
      18 org.python.python 0x00000001000e202a Py_Finalize + 186
      19 org.python.python 0x00000001000e1b46 handle_system_exit + 246
      20 org.python.python 0x00000001000e1d95 PyErr_PrintEx + 437
      21 org.python.python 0x00000001000f16a4 RunModule + 404
      22 org.python.python 0x00000001000f2109 Py_Main + 2505
      23 org.python.python 0x0000000100000e22 0x100000000 + 3618
      24 org.python.python 0x0000000100000d41 0x100000000 + 3393

      Thread 1 Crashed:
      0 libSystem.B.dylib 0x00007fff869f39ce __semwait_signal_nocancel + 10
      1 libSystem.B.dylib 0x00007fff869f38d0 nanosleep$NOCANCEL + 129
      2 libSystem.B.dylib 0x00007fff86a503ce usleep$NOCANCEL + 57
      3 libSystem.B.dylib 0x00007fff86a6fa00 abort + 93
      4 _mesos.so 0x000000010190495c google::LogSink::~LogSink() + 0
      5 _mesos.so 0x000000010190473b google::LogMessage::Fail() + 13
      6 _mesos.so 0x0000000101909cae google::LogMessage::SendToLog() + 1212
      7 _mesos.so 0x00000001019066c6 google::LogMessage::Flush() + 418
      8 _mesos.so 0x0000000101907e5e google::LogMessageFatal::~LogMessageFatal() + 22
      9 _mesos.so 0x0000000101915e4c process::ProcessManager::wait(process::ProcessBase*, process::UPID const&) + 670
      10 _mesos.so 0x0000000101920eb9 process::wait(process::UPID const&, double) + 183
      11 _mesos.so 0x000000010186c359 process::wait(process::ProcessBase const*, double) + 45
      12 _mesos.so 0x000000010185d2c6 mesos::MesosExecutorDriver::~MesosExecutorDriver() + 98
      13 _mesos.so 0x000000010171791a mesos::python::MesosExecutorDriverImpl_dealloc(mesos::python::MesosExecutorDriverImpl*) + 42 (mesos_executor_driver_impl.cpp:159)
      14 org.python.python 0x0000000100069521 tupledealloc + 129
      15 org.python.python 0x0000000100010a0a PyObject_CallMethod + 474
      16 _mesos.so 0x000000010171b477 mesos::python::ProxyExecutor::launchTask(mesos::ExecutorDriver*, mesos::TaskDescription const&) + 103 (proxy_executor.cpp:68)
      17 _mesos.so 0x0000000101860641 boost::_mfi::mf2<void, mesos::Executor, mesos::ExecutorDriver*, mesos::TaskDescription const&>::operator()(mesos::Executor*, mesos::ExecutorDriver*, mesos::TaskDescription const&) const + 113
      18 _mesos.so 0x0000000101861010 void boost::_bi::list3<boost::_bi::value<mesos::Executor*>, boost::_bi::value<mesos::MesosExecutorDriver*>, boost::reference_wrapper<mesos::TaskDescription const> >::operator()<boost::_mfi::mf2<void, mesos::Executor, mesos::ExecutorDriver*, mesos::TaskDescription const&>, boost::_bi::list0>(boost::_bi::type<void>, boost::_mfi::mf2<void, mesos::Executor, mesos::ExecutorDriver*, mesos::TaskDescription const&>&, boost::_bi::list0&, int) + 118
      19 _mesos.so 0x0000000101861052 boost::_bi::bind_t<void, boost::_mfi::mf2<void, mesos::Executor, mesos::ExecutorDriver*, mesos::TaskDescription const&>, boost::_bi::list3<boost::_bi::value<mesos::Executor*>, boost::_bi::value<mesos::MesosExecutorDriver*>, boost::reference_wrapper<mesos::TaskDescription const> > >::operator()() + 54
      20 ??? 0x0000000401861071 0 + 17205432433
      21 libSystem.B.dylib 0x00007fff86a62dc9 setcontext + 25
      22 libSystem.B.dylib 0x00007fff869b9fd6 _pthread_start + 331
      23 libSystem.B.dylib 0x00007fff869b9e89 thread_start + 13

      Thread 2:
      0 libSystem.B.dylib 0x00007fff869c4932 select$DARWIN_EXTSN + 10
      1 _mesos.so 0x0000000101956518 select_poll + 168
      2 _mesos.so 0x0000000101957327 ev_loop + 631
      3 _mesos.so 0x00000001019195ff process::serve(void*) + 26
      4 libSystem.B.dylib 0x00007fff869b9fd6 _pthread_start + 331
      5 libSystem.B.dylib 0x00007fff869b9e89 thread_start + 13

      Thread 1 crashed with X86 Thread State (64-bit):
      rax: 0x000000000000003c rbx: 0x00000001023e3340 rcx: 0x00000001023e32f8 rdx: 0x0000000000000001
      rdi: 0x000000000000030f rsi: 0x0000000000000000 rbp: 0x00000001023e3330 rsp: 0x00000001023e32f8
      r8: 0x0000000000000000 r9: 0x0000000000989680 r10: 0x0000000000000001 r11: 0x0000000000000246
      r12: 0x0000000000000000 r13: 0x0000000102264cf8 r14: 0x0000000000000002 r15: 0x0000000100179110
      rip: 0x00007fff869f39ce rfl: 0x0000000000000247 cr2: 0x000000010158cc00

      Binary Images:
      0x100000000 - 0x100000ff7 +org.python.python 2.6.7 (2.6.7) <DE73C8D7-8FE7-91E3-E747-F9410238C08F> /Users/wickman/Local/Library/Frameworks/Python.framework/Versions/2.6/Resources/Python.app/Contents/MacOS/Python
      0x100003000 - 0x100153ff7 +org.python.python 2.6.7, (c) 2004-2008 Python Software Foundation. (2.6.7) <F55830D0-BF78-CD2D-D7C3-8B06D61B89A2> /Users/wickman/Local/Library/Frameworks/Python.framework/Versions/2.6/Python
      0x1002fb000 - 0x1002fcfff +_json.so ??? (???) <E7E8F1AE-788B-4C9C-1F32-4563FDBB9B32> /Users/wickman/Local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/lib-dynload/_json.so
      0x100440000 - 0x100443ff7 +zlib.so ??? (???) <7F0DCA3D-EE00-0742-4F3F-F858148BD0F8> /Users/wickman/Local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/lib-dynload/zlib.so
      0x100488000 - 0x10048bfff +math.so ??? (???) <B8550672-AC88-C793-81A8-2C9B8F568541> /Users/wickman/Local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/lib-dynload/math.so
      0x100491000 - 0x100492ff7 +time.so ??? (???) <5B4EB63E-F442-0C01-3EFA-85E1578C71BF> /Users/wickman/Local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/lib-dynload/time.so
      0x100497000 - 0x10049aff7 +select.so ??? (???) <5F113BD9-E2E6-32FA-F5EC-8D1CB714D39D> /Users/wickman/Local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/lib-dynload/select.so
      0x10049f000 - 0x1004a0ff7 +fcntl.so ??? (???) <0C585E11-AA6A-C307-43FD-52E1195AC9E6> /Users/wickman/Local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/lib-dynload/fcntl.so
      0x1004e3000 - 0x1004e8ff7 +_struct.so ??? (???) <DE726454-5661-6198-09A3-914217769C47> /Users/wickman/Local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/lib-dynload/_struct.so
      0x1004ef000 - 0x1004f1fe7 +binascii.so ??? (???) <F377A449-B4B9-3A96-3EC7-7D1DB272A0C2> /Users/wickman/Local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/lib-dynload/binascii.so
      0x1004f5000 - 0x1004f6fff +cStringIO.so ??? (???) <F4762F1D-BC48-5D5F-8747-1FECBE43DFAB> /Users/wickman/Local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/lib-dynload/cStringIO.so
      0x10053b000 - 0x10053cff7 +_hashlib.so ??? (???) <25BE55BB-FA03-671D-1697-6181AB32DB9B> /Users/wickman/Local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/lib-dynload/_hashlib.so
      0x100540000 - 0x100541fff +termios.so ??? (???) <D2D1DE14-3799-B958-3EEB-70C834D1EEE0> /Users/wickman/Local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/lib-dynload/termios.so
      0x100546000 - 0x100554fe7 +datetime.so ??? (???) <5975895D-4A18-EBFB-FFB9-B3C815D0116D> /Users/wickman/Local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/lib-dynload/datetime.so
      0x100560000 - 0x100564fff +_collections.so ??? (???) <7C600BC4-8E88-B5D7-4C3F-C47BE74460BE> /Users/wickman/Local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/lib-dynload/_collections.so
      0x10056a000 - 0x10056efff +operator.so ??? (???) <8B28D217-1C9C-19AA-3952-C1FF8FFA248A> /Users/wickman/Local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/lib-dynload/operator.so
      0x100575000 - 0x100576fff +_random.so ??? (???) <574D30E0-2727-337F-F39C-1701CAA0F85A> /Users/wickman/Local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/lib-dynload/_random.so
      0x1007e6000 - 0x1007e9ff7 +strop.so ??? (???) <15BD9A78-D5BE-6E04-5595-CD0132F5C32C> /Users/wickman/Local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/lib-dynload/strop.so
      0x1007ee000 - 0x1007efff7 +_functools.so ??? (???) <8947EAB7-5F3E-412A-2EF6-81DE9BFDF702> /Users/wickman/Local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/lib-dynload/_functools.so
      0x1007f2000 - 0x1007f4ff7 +_locale.so ??? (???) <3674FAA6-719B-8284-AABF-6381A8B97577> /Users/wickman/Local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/lib-dynload/_locale.so
      0x1007fb000 - 0x1007fbfff +_weakref.so ??? (???) <89494552-9A1C-C2B3-5ADE-2F8A8DD6F977> /Users/wickman/Local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/lib-dynload/_weakref.so
      0x101430000 - 0x101464fff +pyexpat.so ??? (???) <E028F381-F4B5-8148-84CF-9099B43A3A8B> /Users/wickman/Local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/lib-dynload/pyexpat.so
      0x1014f6000 - 0x1014faff7 +_ssl.so ??? (???) <01812E71-E811-0EB0-70A6-7729BA0BDFEF> /Users/wickman/Local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/lib-dynload/_ssl.so
      0x101700000 - 0x101b40fff +_mesos.so ??? (???) <5D8D6FDF-58AC-990A-AE20-EA1BFA36B739> /Users/wickman/.python-eggs/mesos-68-py2.6-macosx-10.4-x86_64.egg-tmp/_mesos.so
      0x102312000 - 0x102319fff +_socket.so ??? (???) <DD52D015-A984-CB02-BE51-EC8F9C0B2559> /Users/wickman/Local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/lib-dynload/_socket.so
      0x7fff5fc00000 - 0x7fff5fc3bdef dyld 132.1 (???) <DB8B8AB0-0C97-B51C-BE8B-B79895735A33> /usr/lib/dyld
      0x7fff808ed000 - 0x7fff809a3ff7 libobjc.A.dylib 227.0.0 (compatibility 1.0.0) <03140531-3B2D-1EBA-DA7F-E12CC8F63969> /usr/lib/libobjc.A.dylib
      0x7fff8203b000 - 0x7fff82078ff7 libssl.0.9.8.dylib 0.9.8 (compatibility 0.9.8) <F743389F-F25A-A77D-4FCA-D6B01AF2EE6D> /usr/lib/libssl.0.9.8.dylib
      0x7fff83494000 - 0x7fff835b3fe7 libcrypto.0.9.8.dylib 0.9.8 (compatibility 0.9.8) <14115D29-432B-CF02-6B24-A60CC533A09E> /usr/lib/libcrypto.0.9.8.dylib
      0x7fff83a5d000 - 0x7fff83bd4fe7 com.apple.CoreFoundation 6.6.5 (550.43) <31A1C118-AD96-0A11-8BDF-BD55B9940EDC> /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation
      0x7fff84ea2000 - 0x7fff84ea3ff7 com.apple.TrustEvaluationAgent 1.1 (1) <A91CE5B9-3C63-5F8C-5052-95CCAB866F72> /System/Library/PrivateFrameworks/TrustEvaluationAgent.framework/Versions/A/TrustEvaluationAgent
      0x7fff86980000 - 0x7fff86b41fef libSystem.B.dylib 125.2.11 (compatibility 1.0.0) <9AB4F1D1-89DC-0E8A-DC8E-A4FE4D69DB69> /usr/lib/libSystem.B.dylib
      0x7fff86c32000 - 0x7fff86df0fff libicucore.A.dylib 40.0.0 (compatibility 1.0.0) <4274FC73-A257-3A56-4293-5968F3428854> /usr/lib/libicucore.A.dylib
      0x7fff87c06000 - 0x7fff87c0aff7 libmathCommon.A.dylib 315.0.0 (compatibility 1.0.0) <95718673-FEEE-B6ED-B127-BCDBDB60D4E5> /usr/lib/system/libmathCommon.A.dylib
      0x7fff88a86000 - 0x7fff88ad2fff libauto.dylib ??? (???) <F7221B46-DC4F-3153-CE61-7F52C8C293CF> /usr/lib/libauto.dylib
      0x7fff895bd000 - 0x7fff895ceff7 libz.1.dylib 1.2.3 (compatibility 1.0.0) <5BAFAE5C-2307-C27B-464D-582A10A6990B> /usr/lib/libz.1.dylib
      0x7fff89786000 - 0x7fff89803fef libstdc+.6.dylib 7.9.0 (compatibility 7.0.0) <35ECA411-2C08-FD7D-11B1-1B7A04921A5C> /usr/lib/libstdc+.6.dylib
      0x7fffffe00000 - 0x7fffffe01fff libSystem.B.dylib ??? (???) <9AB4F1D1-89DC-0E8A-DC8E-A4FE4D69DB69> /usr/lib/libSystem.B.dylib

      Model: MacBookAir3,1, BootROM MBA31.0061.B01, 2 processors, Intel Core 2 Duo, 1.6 GHz, 4 GB, SMC 1.67f4
      Graphics: NVIDIA GeForce 320M, NVIDIA GeForce 320M, PCI, 256 MB
      Memory Module: global_name
      AirPort: spairport_wireless_card_type_airport_extreme (0x14E4, 0xD1), Broadcom BCM43xx 1.0 (5.10.131.42.4)
      Bluetooth: Version 2.4.5f3, 2 service, 12 devices, 1 incoming serial ports
      Network Service: AirPort, AirPort, en0
      Serial ATA Device: APPLE SSD SM128C, 113 GB
      USB Device: FaceTime Camera (Built-in), 0x05ac (Apple Inc.), 0x850a, 0x24600000 / 2
      USB Device: BRCM2070 Hub, 0x0a5c (Broadcom Corp.), 0x4500, 0x04500000 / 3
      USB Device: Bluetooth USB Host Controller, 0x05ac (Apple Inc.), 0x821b, 0x04530000 / 6
      USB Device: Apple Internal Keyboard / Trackpad, 0x05ac (Apple Inc.), 0x0242, 0x04300000 / 2

      Attachments

        Activity

          People

            benjaminhindman Benjamin Hindman
            wickman Brian Wickman
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: