Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 3.2.2, 3.3.0
    • Fix Version/s: 3.3.1, 3.4.0
    • Labels:
      None
    • Environment:

      Redhat Enterprise 5.4 (python 2.4.3), Mac OS X 10.5.8 (python 2.5.1)

      Description

      On write operations, getting:

      Fatal Python error: deallocating None
      Aborted

      This error happens on write operations only. Here's the backtrace:

      Fatal Python error: deallocating None

      Program received signal SIGABRT, Aborted.
      0x000000383fc30215 in raise () from /lib64/libc.so.6
      (gdb) bt
      #0 0x000000383fc30215 in raise () from /lib64/libc.so.6
      #1 0x000000383fc31cc0 in abort () from /lib64/libc.so.6
      #2 0x00002adbd0be8189 in Py_FatalError () from /usr/lib64/libpython2.4.so.1.0
      #3 0x00002adbd0bc7493 in PyEval_EvalFrame () from /usr/lib64/libpython2.4.so.1.0
      #4 0x00002adbd0bcab66 in PyEval_EvalFrame () from /usr/lib64/libpython2.4.so.1.0
      #5 0x00002adbd0bcbfe5 in PyEval_EvalCodeEx () from /usr/lib64/libpython2.4.so.1.0
      #6 0x00002adbd0bcc032 in PyEval_EvalCode () from /usr/lib64/libpython2.4.so.1.0
      #7 0x00002adbd0be8729 in ?? () from /usr/lib64/libpython2.4.so.1.0
      #8 0x00002adbd0be9bd8 in PyRun_SimpleFileExFlags () from /usr/lib64/libpython2.4.so.1.0
      #9 0x00002adbd0bf000d in Py_Main () from /usr/lib64/libpython2.4.so.1.0
      #10 0x000000383fc1d974 in __libc_start_main () from /lib64/libc.so.6
      #11 0x0000000000400629 in _start ()

      1. ASF.LICENSE.NOT.GRANTED--ZOOKEEPER-742.patch
        2 kB
        Henry Robinson
      2. ASF.LICENSE.NOT.GRANTED--ZOOKEEPER-742.patch
        7 kB
        Henry Robinson
      3. ASF.LICENSE.NOT.GRANTED--foo.p
        0.6 kB
        Josh Fraser
      4. ASF.LICENSE.NOT.GRANTED--commands.py
        2 kB
        Josh Fraser

        Issue Links

          Activity

          Hide
          Patrick Hunt added a comment -

          Closing issue as ZOOKEEPER-631 has landed.

          Show
          Patrick Hunt added a comment - Closing issue as ZOOKEEPER-631 has landed.
          Hide
          Henry Robinson added a comment -

          Patch to ZOOKEEPER-631 should fix this issue - when that is committed, we can close out this ticket.

          Show
          Henry Robinson added a comment - Patch to ZOOKEEPER-631 should fix this issue - when that is committed, we can close out this ticket.
          Hide
          Josh Fraser added a comment -

          SUCCESS! Thanks!

          Show
          Josh Fraser added a comment - SUCCESS! Thanks!
          Hide
          Henry Robinson added a comment -

          This patch does a better job of handling references to Py_None - let me know if this helps.

          Show
          Henry Robinson added a comment - This patch does a better job of handling references to Py_None - let me know if this helps.
          Hide
          Josh Fraser added a comment -

          Added output from the latest attempts with the patched file

          Show
          Josh Fraser added a comment - Added output from the latest attempts with the patched file
          Hide
          Josh Fraser added a comment -

          4088] futex(0xdcc3e78, FUTEX_WAKE_PRIVATE, 1) = 0
          [pid 4088] write(1, "('/audubon/group/pod/twitter1/me"..., 65('/audubon/group/pod/twitter1/members/sjc1a003.twitter.com', '')
          ) = 65
          [pid 4088] write(4, "\0", 1) = 1
          [pid 4089] <... poll resumed> ) = 1 ([

          {fd=3, revents=POLLIN}

          ])
          [pid 4088] futex(0xdcc3e4c, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
          [pid 4089] read(3, "\0", 128) = 1
          [pid 4089] poll([

          {fd=3, events=POLLIN}, {fd=5, events=POLLIN|POLLOUT}], 2, 3332) = 1 ([{fd=5, revents=POLLOUT}])
          [pid 4089] sendto(5, "\0\0\0E", 4, 0, NULL, 0) = 4
          [pid 4089] sendto(5, "K\310=\314\0\0\0\3\0\0\0008/audubon/group/pod/t"..., 69, 0, NULL, 0) = 69
          [pid 4089] poll([{fd=3, events=POLLIN}

          ,

          {fd=5, events=POLLIN}], 2, 3333) = 1 ([{fd=5, revents=POLLIN}])
          [pid 4089] recvfrom(5, "\0\0\0\20", 4, 0, NULL, NULL) = 4
          [pid 4089] recvfrom(5, "K\310=\314\377\377\377\377\377\377\377\377\377\377\377\233", 16, 0, NULL, NULL) = 16
          [pid 4089] futex(0xdcc3e4c, FUTEX_CMP_REQUEUE_PRIVATE, 1, 2147483647, 0xdcc3e78, 2) = 1
          [pid 4088] <... futex resumed> ) = 0
          [pid 4089] poll([{fd=3, events=POLLIN}, {fd=5, events=POLLIN}

          ], 2, 3333 <unfinished ...>
          [pid 4088] futex(0xdcc3e78, FUTEX_WAKE_PRIVATE, 1) = 0
          [pid 4088] write(2, "Fatal Python error: deallocating"..., 38Fatal Python error: deallocating None
          ) = 38
          [pid 4088] rt_sigprocmask(SIG_UNBLOCK, [ABRT], NULL, 8) = 0
          [pid 4088] tgkill(4088, 4088, SIGABRT) = 0
          [pid 4088] — SIGABRT (Aborted) @ 0 (0) —
          upeek: ptrace(PTRACE_PEEKUSER,4089,120,0): No such process
          Process 4089 detached
          [pid 4090] +++ killed by SIGABRT +++

          Show
          Josh Fraser added a comment - 4088] futex(0xdcc3e78, FUTEX_WAKE_PRIVATE, 1) = 0 [pid 4088] write(1, "('/audubon/group/pod/twitter1/me"..., 65('/audubon/group/pod/twitter1/members/sjc1a003.twitter.com', '') ) = 65 [pid 4088] write(4, "\0", 1) = 1 [pid 4089] <... poll resumed> ) = 1 ([ {fd=3, revents=POLLIN} ]) [pid 4088] futex(0xdcc3e4c, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...> [pid 4089] read(3, "\0", 128) = 1 [pid 4089] poll([ {fd=3, events=POLLIN}, {fd=5, events=POLLIN|POLLOUT}], 2, 3332) = 1 ( [{fd=5, revents=POLLOUT}] ) [pid 4089] sendto(5, "\0\0\0E", 4, 0, NULL, 0) = 4 [pid 4089] sendto(5, "K\310=\314\0\0\0\3\0\0\0008/audubon/group/pod/t"..., 69, 0, NULL, 0) = 69 [pid 4089] poll([{fd=3, events=POLLIN} , {fd=5, events=POLLIN}], 2, 3333) = 1 ( [{fd=5, revents=POLLIN}] ) [pid 4089] recvfrom(5, "\0\0\0\20", 4, 0, NULL, NULL) = 4 [pid 4089] recvfrom(5, "K\310=\314\377\377\377\377\377\377\377\377\377\377\377\233", 16, 0, NULL, NULL) = 16 [pid 4089] futex(0xdcc3e4c, FUTEX_CMP_REQUEUE_PRIVATE, 1, 2147483647, 0xdcc3e78, 2) = 1 [pid 4088] <... futex resumed> ) = 0 [pid 4089] poll([{fd=3, events=POLLIN}, {fd=5, events=POLLIN} ], 2, 3333 <unfinished ...> [pid 4088] futex(0xdcc3e78, FUTEX_WAKE_PRIVATE, 1) = 0 [pid 4088] write(2, "Fatal Python error: deallocating"..., 38Fatal Python error: deallocating None ) = 38 [pid 4088] rt_sigprocmask(SIG_UNBLOCK, [ABRT] , NULL, 8) = 0 [pid 4088] tgkill(4088, 4088, SIGABRT) = 0 [pid 4088] — SIGABRT (Aborted) @ 0 (0) — upeek: ptrace(PTRACE_PEEKUSER,4089,120,0): No such process Process 4089 detached [pid 4090] +++ killed by SIGABRT +++
          Hide
          Josh Fraser added a comment -

          #0 0x000000383fc30215 in raise () from /lib64/libc.so.6
          #1 0x000000383fc31cc0 in abort () from /lib64/libc.so.6
          #2 0x00002b55a4441189 in Py_FatalError () from /usr/lib64/libpython2.4.so.1.0
          #3 0x00002b55a4420493 in PyEval_EvalFrame ()
          from /usr/lib64/libpython2.4.so.1.0
          #4 0x00002b55a4423b66 in PyEval_EvalFrame ()
          from /usr/lib64/libpython2.4.so.1.0
          #5 0x00002b55a4424fe5 in PyEval_EvalCodeEx ()
          from /usr/lib64/libpython2.4.so.1.0
          #6 0x00002b55a4425032 in PyEval_EvalCode ()
          from /usr/lib64/libpython2.4.so.1.0
          #7 0x00002b55a4441729 in ?? () from /usr/lib64/libpython2.4.so.1.0
          #8 0x00002b55a4442bd8 in PyRun_SimpleFileExFlags ()
          from /usr/lib64/libpython2.4.so.1.0
          #9 0x00002b55a444900d in Py_Main () from /usr/lib64/libpython2.4.so.1.0
          #10 0x000000383fc1d974 in __libc_start_main () from /lib64/libc.so.6
          #11 0x0000000000400629 in _start ()
          (gdb)

          Show
          Josh Fraser added a comment - #0 0x000000383fc30215 in raise () from /lib64/libc.so.6 #1 0x000000383fc31cc0 in abort () from /lib64/libc.so.6 #2 0x00002b55a4441189 in Py_FatalError () from /usr/lib64/libpython2.4.so.1.0 #3 0x00002b55a4420493 in PyEval_EvalFrame () from /usr/lib64/libpython2.4.so.1.0 #4 0x00002b55a4423b66 in PyEval_EvalFrame () from /usr/lib64/libpython2.4.so.1.0 #5 0x00002b55a4424fe5 in PyEval_EvalCodeEx () from /usr/lib64/libpython2.4.so.1.0 #6 0x00002b55a4425032 in PyEval_EvalCode () from /usr/lib64/libpython2.4.so.1.0 #7 0x00002b55a4441729 in ?? () from /usr/lib64/libpython2.4.so.1.0 #8 0x00002b55a4442bd8 in PyRun_SimpleFileExFlags () from /usr/lib64/libpython2.4.so.1.0 #9 0x00002b55a444900d in Py_Main () from /usr/lib64/libpython2.4.so.1.0 #10 0x000000383fc1d974 in __libc_start_main () from /lib64/libc.so.6 #11 0x0000000000400629 in _start () (gdb)
          Hide
          Henry Robinson added a comment -

          Josh - are you able to apply this patch and try it out? It's not ready to be committed, but it adds a bit of defensive programming that should hopefully fix the error you're seeing.

          Show
          Henry Robinson added a comment - Josh - are you able to apply this patch and try it out? It's not ready to be committed, but it adds a bit of defensive programming that should hopefully fix the error you're seeing.
          Hide
          Henry Robinson added a comment -

          The bad news is I can't recreate the error, but the good news is that there are only four call sites where this could be happening (the error is to decrease a reference to Py_None, which is a bad idea).

          I can have a patch for you to try up here shortly.

          Show
          Henry Robinson added a comment - The bad news is I can't recreate the error, but the good news is that there are only four call sites where this could be happening (the error is to decrease a reference to Py_None, which is a bad idea). I can have a patch for you to try up here shortly.
          Hide
          Josh Fraser added a comment -

          Attached commands.py – nothing private in the file.

          Show
          Josh Fraser added a comment - Attached commands.py – nothing private in the file.
          Hide
          Henry Robinson added a comment -

          Thanks very much for this - any chance you can share Commands as well, so that I can see the actual zookeeper API calls that are being made? Let me know if you're not comfortable posting it publicly.

          Show
          Henry Robinson added a comment - Thanks very much for this - any chance you can share Commands as well, so that I can see the actual zookeeper API calls that are being made? Let me know if you're not comfortable posting it publicly.
          Hide
          Josh Fraser added a comment -

          attached the file since pasting lost the formatting

          Show
          Josh Fraser added a comment - attached the file since pasting lost the formatting
          Hide
          Josh Fraser added a comment -

          Here's the script it its entirety. I tried adding the sleep to see if it mattered how quickly the writes were occurring.

          #!/usr/bin/env python

          from zktools.commands import Commands
          import time

          cmd = Commands()

          for i in cmd.listNode('/audubon/group'):
          group_type = ''
          group_members = []
          group_name = i.split('/')[-1]
          try:
          group_type = cmd.catNode('%s/type' % i)[0]
          group_members = [ h.split('/')[-1] for h in cmd.listNode('%s/members' % i) ]
          for member in group_members:
          value = cmd.setNode('/audubon/group/%s/%s/members/%s' % (group_type, group_name, member), '')
          print value
          time.sleep(1)
          except:
          print "group name: %s isn't happy" % group_name

          Show
          Josh Fraser added a comment - Here's the script it its entirety. I tried adding the sleep to see if it mattered how quickly the writes were occurring. #!/usr/bin/env python from zktools.commands import Commands import time cmd = Commands() for i in cmd.listNode('/audubon/group'): group_type = '' group_members = [] group_name = i.split('/') [-1] try: group_type = cmd.catNode('%s/type' % i) [0] group_members = [ h.split('/') [-1] for h in cmd.listNode('%s/members' % i) ] for member in group_members: value = cmd.setNode('/audubon/group/%s/%s/members/%s' % (group_type, group_name, member), '') print value time.sleep(1) except: print "group name: %s isn't happy" % group_name
          Hide
          Henry Robinson added a comment -

          Thanks Josh - can you share the portion of your script that is causing the problem?

          Show
          Henry Robinson added a comment - Thanks Josh - can you share the portion of your script that is causing the problem?
          Hide
          Josh Fraser added a comment -

          Here's an strace of what's happening when we get the error:

          pid 5468] futex(0x2aaaac00f7ac, FUTEX_CMP_REQUEUE_PRIVATE, 1, 2147483647, 0x2aaaac00f7d8, 2) = 1
          [pid 5467] <... futex resumed> ) = 0
          [pid 5468] poll([

          {fd=3, events=POLLIN}, {fd=5, events=POLLIN}], 2, 3332 <unfinished ...>
          [pid 5467] futex(0x2aaaac00f7d8, FUTEX_WAKE_PRIVATE, 1) = 0
          [pid 5467] write(1, "('/audubon/group/datacenter/sjc1"..., 68('/audubon/group/datacenter/sjc1/members/sjc1a250.twitter.com', '')
          ) = 68
          [pid 5467] select(0, NULL, NULL, NULL, {1, 0}) = 0 (Timeout)
          [pid 5467] write(4, "\0", 1) = 1
          [pid 5468] <... poll resumed> ) = 1 ([{fd=3, revents=POLLIN}])
          [pid 5468] read(3, <unfinished ...>
          [pid 5467] futex(0x2aaaac00f7ac, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
          [pid 5468] <... read resumed> "\0", 128) = 1
          [pid 5468] poll([{fd=3, events=POLLIN}

          ,

          {fd=5, events=POLLIN|POLLOUT}

          ], 2, 2331) = 1 ([

          {fd=5, revents=POLLOUT}

          ])
          [pid 5468] sendto(5, "\0\0\0H", 4, 0, NULL, 0) = 4
          [pid 5468] sendto(5, "K\307\272\300\0\0\0\3\0\0\0;/audubon/group/datac"..., 72, 0, NULL, 0) = 72
          [pid 5468] poll([

          {fd=3, events=POLLIN}, {fd=5, events=POLLIN}], 2, 3333) = 1 ([{fd=5, revents=POLLIN}])
          [pid 5468] recvfrom(5, "\0\0\0\20", 4, 0, NULL, NULL) = 4
          [pid 5468] recvfrom(5, "K\307\272\300\0\0\0\17\0\2\244`\377\377\377\233", 16, 0, NULL, NULL) = 16
          [pid 5468] futex(0x2aaaac00f7ac, FUTEX_CMP_REQUEUE_PRIVATE, 1, 2147483647, 0x2aaaac00f7d8, 2) = 1
          [pid 5467] <... futex resumed> ) = 0
          [pid 5468] poll([{fd=3, events=POLLIN}

          ,

          {fd=5, events=POLLIN}

          ], 2, 3333 <unfinished ...>
          [pid 5467] futex(0x2aaaac00f7d8, FUTEX_WAKE_PRIVATE, 1) = 0
          [pid 5467] write(2, "Fatal Python error: deallocating"..., 38Fatal Python error: deallocating None
          ) = 38
          [pid 5467] rt_sigprocmask(SIG_UNBLOCK, [ABRT], NULL, 8) = 0
          [pid 5467] tgkill(5467, 5467, SIGABRT) = 0
          [pid 5467] — SIGABRT (Aborted) @ 0 (0) —
          upeek: ptrace(PTRACE_PEEKUSER,5468,120,0): No such process
          Process 5468 detached
          [pid 5469] +++ killed by SIGABRT +++

          Show
          Josh Fraser added a comment - Here's an strace of what's happening when we get the error: pid 5468] futex(0x2aaaac00f7ac, FUTEX_CMP_REQUEUE_PRIVATE, 1, 2147483647, 0x2aaaac00f7d8, 2) = 1 [pid 5467] <... futex resumed> ) = 0 [pid 5468] poll([ {fd=3, events=POLLIN}, {fd=5, events=POLLIN}], 2, 3332 <unfinished ...> [pid 5467] futex(0x2aaaac00f7d8, FUTEX_WAKE_PRIVATE, 1) = 0 [pid 5467] write(1, "('/audubon/group/datacenter/sjc1"..., 68('/audubon/group/datacenter/sjc1/members/sjc1a250.twitter.com', '') ) = 68 [pid 5467] select(0, NULL, NULL, NULL, {1, 0}) = 0 (Timeout) [pid 5467] write(4, "\0", 1) = 1 [pid 5468] <... poll resumed> ) = 1 ( [{fd=3, revents=POLLIN}] ) [pid 5468] read(3, <unfinished ...> [pid 5467] futex(0x2aaaac00f7ac, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...> [pid 5468] <... read resumed> "\0", 128) = 1 [pid 5468] poll([{fd=3, events=POLLIN} , {fd=5, events=POLLIN|POLLOUT} ], 2, 2331) = 1 ([ {fd=5, revents=POLLOUT} ]) [pid 5468] sendto(5, "\0\0\0H", 4, 0, NULL, 0) = 4 [pid 5468] sendto(5, "K\307\272\300\0\0\0\3\0\0\0;/audubon/group/datac"..., 72, 0, NULL, 0) = 72 [pid 5468] poll([ {fd=3, events=POLLIN}, {fd=5, events=POLLIN}], 2, 3333) = 1 ( [{fd=5, revents=POLLIN}] ) [pid 5468] recvfrom(5, "\0\0\0\20", 4, 0, NULL, NULL) = 4 [pid 5468] recvfrom(5, "K\307\272\300\0\0\0\17\0\2\244`\377\377\377\233", 16, 0, NULL, NULL) = 16 [pid 5468] futex(0x2aaaac00f7ac, FUTEX_CMP_REQUEUE_PRIVATE, 1, 2147483647, 0x2aaaac00f7d8, 2) = 1 [pid 5467] <... futex resumed> ) = 0 [pid 5468] poll([{fd=3, events=POLLIN} , {fd=5, events=POLLIN} ], 2, 3333 <unfinished ...> [pid 5467] futex(0x2aaaac00f7d8, FUTEX_WAKE_PRIVATE, 1) = 0 [pid 5467] write(2, "Fatal Python error: deallocating"..., 38Fatal Python error: deallocating None ) = 38 [pid 5467] rt_sigprocmask(SIG_UNBLOCK, [ABRT] , NULL, 8) = 0 [pid 5467] tgkill(5467, 5467, SIGABRT) = 0 [pid 5467] — SIGABRT (Aborted) @ 0 (0) — upeek: ptrace(PTRACE_PEEKUSER,5468,120,0): No such process Process 5468 detached [pid 5469] +++ killed by SIGABRT +++

            People

            • Assignee:
              Henry Robinson
              Reporter:
              Josh Fraser
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development