ZooKeeper
  1. ZooKeeper
  2. ZOOKEEPER-740

zkpython leading to segfault on zookeeper

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 3.3.0
    • Fix Version/s: 3.5.0
    • Component/s: None
    • Labels:
      None

      Description

      The program that we are implementing uses the python binding for zookeeper but sometimes it crash with segfault; here is the bt from gdb:

      Program received signal SIGSEGV, Segmentation fault.
      [Switching to Thread 0xad244b70 (LWP 28216)]
      0x080611d5 in PyObject_Call (func=0x862fab0, arg=0x8837194, kw=0x0)
      at ../Objects/abstract.c:2488
      2488 ../Objects/abstract.c: No such file or directory.
      in ../Objects/abstract.c
      (gdb) bt
      #0 0x080611d5 in PyObject_Call (func=0x862fab0, arg=0x8837194, kw=0x0)
      at ../Objects/abstract.c:2488
      #1 0x080d6ef2 in PyEval_CallObjectWithKeywords (func=0x862fab0,
      arg=0x8837194, kw=0x0) at ../Python/ceval.c:3575
      #2 0x080612a0 in PyObject_CallObject (o=0x862fab0, a=0x8837194)
      at ../Objects/abstract.c:2480
      #3 0x0047af42 in watcher_dispatch (zzh=0x86174e0, type=-1, state=1,
      path=0x86337c8 "", context=0x8588660) at src/c/zookeeper.c:314
      #4 0x00496559 in do_foreach_watcher (zh=0x86174e0, type=-1, state=1,
      path=0x86337c8 "", list=0xa5354140) at src/zk_hashtable.c:275
      #5 deliverWatchers (zh=0x86174e0, type=-1, state=1, path=0x86337c8 "",
      list=0xa5354140) at src/zk_hashtable.c:317
      #6 0x0048ae3c in process_completions (zh=0x86174e0) at src/zookeeper.c:1766
      #7 0x0049706b in do_completion (v=0x86174e0) at src/mt_adaptor.c:333
      #8 0x0013380e in start_thread () from /lib/tls/i686/cmov/libpthread.so.0
      #9 0x002578de in clone () from /lib/tls/i686/cmov/libc.so.6

        Issue Links

          Activity

          Ice made changes -
          Priority Critical [ 2 ] Major [ 3 ]
          Mahadev konar made changes -
          Fix Version/s 3.5.0 [ 12316644 ]
          Fix Version/s 3.4.0 [ 12314469 ]
          Hide
          Mahadev konar added a comment -

          Moving this out to 3.5 release.

          Show
          Mahadev konar added a comment - Moving this out to 3.5 release.
          Hide
          Mahadev konar added a comment -

          Any update on this? Should we try and get this to 3.4 release?

          Show
          Mahadev konar added a comment - Any update on this? Should we try and get this to 3.4 release?
          Hide
          Austin Shoemaker added a comment -

          I had also uploaded a ZOOKEEPER-888 patch based on the 3.3 branch (ZOOKEEPER-888-3.3.patch). Maybe that helps?

          Show
          Austin Shoemaker added a comment - I had also uploaded a ZOOKEEPER-888 patch based on the 3.3 branch ( ZOOKEEPER-888 -3.3.patch). Maybe that helps?
          Patrick Hunt made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Hide
          Patrick Hunt added a comment -

          Looks like the patch is failing to apply. Could someone update and resubmit?

          Show
          Patrick Hunt added a comment - Looks like the patch is failing to apply. Could someone update and resubmit?
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12453927/ZOOKEEPER-740.patch
          against trunk revision 1033155.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          -1 patch. The patch command could not apply the patch.

          Console output: https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/18//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12453927/ZOOKEEPER-740.patch against trunk revision 1033155. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/18//console This message is automatically generated.
          Hide
          Andrei Savu added a comment -

          It seems like this issue is fixed on the trunk by patch ZOOKEEPER-888 but unfortunately the 3.3 branch still gives a segfault. I've tested using the code sample provided by Mike Solomon.

          Show
          Andrei Savu added a comment - It seems like this issue is fixed on the trunk by patch ZOOKEEPER-888 but unfortunately the 3.3 branch still gives a segfault. I've tested using the code sample provided by Mike Solomon.
          Hide
          Austin Shoemaker added a comment -

          ZOOKEEPER-740.patch fixes the crash, though it looks like the pywatcher_t will be leaked on an unrecoverable session state change (EXPIRED_SESSION_STATE or AUTH_FAILED_STATE). Attached a proposed revision to ZOOKEEPER-888 for your review.

          Show
          Austin Shoemaker added a comment - ZOOKEEPER-740 .patch fixes the crash, though it looks like the pywatcher_t will be leaked on an unrecoverable session state change (EXPIRED_SESSION_STATE or AUTH_FAILED_STATE). Attached a proposed revision to ZOOKEEPER-888 for your review.
          Austin Shoemaker made changes -
          Link This issue is duplicated by ZOOKEEPER-888 [ ZOOKEEPER-888 ]
          Andrei Savu made changes -
          Status Reopened [ 4 ] Patch Available [ 10002 ]
          Andrei Savu made changes -
          Attachment ZOOKEEPER-740.patch [ 12453927 ]
          Hide
          Andrei Savu added a comment -

          I've created a patch as Mike suggested. I've been able to reproduce the issue and the change seems to fix it. I don't know if it prevent watchers from eventually being correctly freed. I've also run all the tests and everything seems to work fine.

          Show
          Andrei Savu added a comment - I've created a patch as Mike suggested. I've been able to reproduce the issue and the change seems to fix it. I don't know if it prevent watchers from eventually being correctly freed. I've also run all the tests and everything seems to work fine.
          Hide
          Henry Robinson added a comment -

          Mike -

          Great catch, thanks for figuring this out.

          I'm correct in saying that this doesn't prevent watchers from eventually being correctly freed, right?

          If so, then it would be great if you could submit this patch formally so that we can get it into trunk. See http://wiki.apache.org/hadoop/ZooKeeper/HowToContribute for details.

          Thanks,
          Henry

          Show
          Henry Robinson added a comment - Mike - Great catch, thanks for figuring this out. I'm correct in saying that this doesn't prevent watchers from eventually being correctly freed, right? If so, then it would be great if you could submit this patch formally so that we can get it into trunk. See http://wiki.apache.org/hadoop/ZooKeeper/HowToContribute for details. Thanks, Henry
          Hide
          Mike Solomon added a comment -

          The common cases is when you supply a watcher for a get().

          Start a zk server on localhost and create a node /zk.

          Connect a python process like this:

          import sys
          import zookeeper

          zh = zookeeper.init('localhost:2181')

          def _zk_callback(*args):
          print >> sys.stderr, "_zk_callback", args

          zookeeper.get(zh, '/zk', _zk_callback)

          Kill the zk server. The client will idle fine. When restarting the zk server, I get a SIGSEGV on reconnect 100% of the time.

          This is fixed by the following patch:

          [msolomon]yuriko:~/src/zookeeper-3.3.1/src/contrib/zkpython> svn di
          Index: src/c/zookeeper.c
          ===================================================================
          — src/c/zookeeper.c (revision 951628)
          +++ src/c/zookeeper.c (working copy)
          @@ -436,7 +436,9 @@
          if (PyObject_CallObject((PyObject*)callback, arglist) == NULL)

          { PyErr_Print(); }
          • if (pyw->permanent == 0) {
            + // msolomon: when a session event happens, watchers get dispatched,
            + // but they are retained in the C client for dispatch again.
            + if (pyw->permanent == 0 && type != ZOO_SESSION_EVENT) { free_pywatcher(pyw); }

            PyGILState_Release(gstate);

          Show
          Mike Solomon added a comment - The common cases is when you supply a watcher for a get(). Start a zk server on localhost and create a node /zk. Connect a python process like this: import sys import zookeeper zh = zookeeper.init('localhost:2181') def _zk_callback(*args): print >> sys.stderr, "_zk_callback", args zookeeper.get(zh, '/zk', _zk_callback) Kill the zk server. The client will idle fine. When restarting the zk server, I get a SIGSEGV on reconnect 100% of the time. This is fixed by the following patch: [msolomon] yuriko:~/src/zookeeper-3.3.1/src/contrib/zkpython> svn di Index: src/c/zookeeper.c =================================================================== — src/c/zookeeper.c (revision 951628) +++ src/c/zookeeper.c (working copy) @@ -436,7 +436,9 @@ if (PyObject_CallObject((PyObject*)callback, arglist) == NULL) { PyErr_Print(); } if (pyw->permanent == 0) { + // msolomon: when a session event happens, watchers get dispatched, + // but they are retained in the C client for dispatch again. + if (pyw->permanent == 0 && type != ZOO_SESSION_EVENT) { free_pywatcher(pyw); } PyGILState_Release(gstate);
          Henry Robinson made changes -
          Fix Version/s 3.3.1 [ 12314846 ]
          Hide
          Henry Robinson added a comment -

          Can't reproduce, or diagnose without code, moving to 3.4.0.

          Show
          Henry Robinson added a comment - Can't reproduce, or diagnose without code, moving to 3.4.0.
          Hide
          Mahadev konar added a comment -

          henry,

          looks like we can move this out of 3.3.1 release. Do you wnat to assign this to just 3.4?

          Show
          Mahadev konar added a comment - henry, looks like we can move this out of 3.3.1 release. Do you wnat to assign this to just 3.4?
          Henry Robinson made changes -
          Resolution Fixed [ 1 ]
          Status Resolved [ 5 ] Reopened [ 4 ]
          Hide
          Henry Robinson added a comment -

          Ok, thanks for the update. Can you share the code that you are running to give the segfault? That will make it much easier for me to diagnose.

          Show
          Henry Robinson added a comment - Ok, thanks for the update. Can you share the code that you are running to give the segfault? That will make it much easier for me to diagnose.
          Hide
          Federico added a comment -

          I applied the patch but i've got again a segfault:

          Program received signal SIGSEGV, Segmentation fault.
          [Switching to Thread 0xad244b70 (LWP 18205)]
          0x080611d5 in PyObject_Call (func=0x87fa8a0, arg=0xb53d9b4, kw=0x0) at ../Objects/abstract.c:2488
          2488 ../Objects/abstract.c: No such file or directory.
          in ../Objects/abstract.c
          (gdb) bt
          #0 0x080611d5 in PyObject_Call (func=0x87fa8a0, arg=0xb53d9b4, kw=0x0) at ../Objects/abstract.c:2488
          #1 0x080d6ef2 in PyEval_CallObjectWithKeywords (func=0x87fa8a0, arg=0xb53d9b4, kw=0x0) at ../Python/ceval.c:3575
          #2 0x080612a0 in PyObject_CallObject (o=0x87fa8a0, a=0xb53d9b4) at ../Objects/abstract.c:2480
          #3 0x0047afb2 in watcher_dispatch (zzh=0x8617038, type=-1, state=1, path=0x8631028 "", context=0x8617258) at src/c/zookeeper.c:419
          #4 0x00497559 in do_foreach_watcher (zh=0x8617038, type=-1, state=1, path=0x8631028 "", list=0x8a71088) at src/zk_hashtable.c:275
          #5 deliverWatchers (zh=0x8617038, type=-1, state=1, path=0x8631028 "", list=0x8a71088) at src/zk_hashtable.c:317
          #6 0x0048be3c in process_completions (zh=0x8617038) at src/zookeeper.c:1766
          #7 0x0049806b in do_completion (v=0x8617038) at src/mt_adaptor.c:333
          #8 0x0013380e in start_thread () from /lib/tls/i686/cmov/libpthread.so.0
          #9 0x002578de in clone () from /lib/tls/i686/cmov/libc.so.6

          I'm not sure it's resolved.

          Show
          Federico added a comment - I applied the patch but i've got again a segfault: Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0xad244b70 (LWP 18205)] 0x080611d5 in PyObject_Call (func=0x87fa8a0, arg=0xb53d9b4, kw=0x0) at ../Objects/abstract.c:2488 2488 ../Objects/abstract.c: No such file or directory. in ../Objects/abstract.c (gdb) bt #0 0x080611d5 in PyObject_Call (func=0x87fa8a0, arg=0xb53d9b4, kw=0x0) at ../Objects/abstract.c:2488 #1 0x080d6ef2 in PyEval_CallObjectWithKeywords (func=0x87fa8a0, arg=0xb53d9b4, kw=0x0) at ../Python/ceval.c:3575 #2 0x080612a0 in PyObject_CallObject (o=0x87fa8a0, a=0xb53d9b4) at ../Objects/abstract.c:2480 #3 0x0047afb2 in watcher_dispatch (zzh=0x8617038, type=-1, state=1, path=0x8631028 "", context=0x8617258) at src/c/zookeeper.c:419 #4 0x00497559 in do_foreach_watcher (zh=0x8617038, type=-1, state=1, path=0x8631028 "", list=0x8a71088) at src/zk_hashtable.c:275 #5 deliverWatchers (zh=0x8617038, type=-1, state=1, path=0x8631028 "", list=0x8a71088) at src/zk_hashtable.c:317 #6 0x0048be3c in process_completions (zh=0x8617038) at src/zookeeper.c:1766 #7 0x0049806b in do_completion (v=0x8617038) at src/mt_adaptor.c:333 #8 0x0013380e in start_thread () from /lib/tls/i686/cmov/libpthread.so.0 #9 0x002578de in clone () from /lib/tls/i686/cmov/libc.so.6 I'm not sure it's resolved.
          Patrick Hunt made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Hide
          Patrick Hunt added a comment -

          Fixed as part of ZOOKEEPER-631

          Show
          Patrick Hunt added a comment - Fixed as part of ZOOKEEPER-631
          Henry Robinson made changes -
          Fix Version/s 3.4.0 [ 12314469 ]
          Henry Robinson made changes -
          Assignee Henry Robinson [ henryr ]
          Hide
          Henry Robinson added a comment -

          Adding link - 631 should hopefully resolve this.

          Show
          Henry Robinson added a comment - Adding link - 631 should hopefully resolve this.
          Henry Robinson made changes -
          Link This issue relates to ZOOKEEPER-631 [ ZOOKEEPER-631 ]
          Hide
          Federico added a comment -

          Thanks, we use servers with ubuntu 9.10 32 bit and the python from the repository (version 2.6.4)

          Show
          Federico added a comment - Thanks, we use servers with ubuntu 9.10 32 bit and the python from the repository (version 2.6.4)
          Hide
          Henry Robinson added a comment -

          Thanks for the bug report, Federico. There are a few similar issues with zkpython that are surely related - I must make sure to address them soon.

          Can you let me know what version of Python and what operating system you are running?

          Show
          Henry Robinson added a comment - Thanks for the bug report, Federico. There are a few similar issues with zkpython that are surely related - I must make sure to address them soon. Can you let me know what version of Python and what operating system you are running?
          Lei Zhang made changes -
          Field Original Value New Value
          Link This issue duplicates ZOOKEEPER-670 [ ZOOKEEPER-670 ]
          Federico created issue -

            People

            • Assignee:
              Henry Robinson
              Reporter:
              Federico
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:

                Development