ZooKeeper
  1. ZooKeeper
  2. ZOOKEEPER-888

c-client / zkpython: Double free corruption on node watcher

    Details

    • Hadoop Flags:
      Reviewed

      Description

      the c-client / zkpython wrapper invokes already freed watcher callback

      steps to reproduce:
      0. start a zookeper server on your machine
      1. run the attached python script
      2. suspend the zookeeper server process (e.g. using `pkill -STOP -f org.apache.zookeeper.server.quorum.QuorumPeerMain` )
      3. wait until the connection and the node observer fired with a session event
      4. resume the zookeeper server process (e.g. using `pkill -CONT -f org.apache.zookeeper.server.quorum.QuorumPeerMain` )

      -> the client tries to dispatch the node observer function again, but it was already freed -> double free corruption

      1. ZOOKEEPER-888-3.3.patch
        4 kB
        Austin Shoemaker
      2. ZOOKEEPER-888.patch
        5 kB
        Austin Shoemaker
      3. resume-segfault.py
        0.9 kB
        Lukas

        Issue Links

          Activity

          Hide
          Hudson added a comment -

          Integrated in ZooKeeper-trunk #972 (See https://hudson.apache.org/hudson/job/ZooKeeper-trunk/972/)
          ZOOKEEPER-888: c-client / zkpython: Double free corruption on node
          watcher (Austin Shoemaker via henryr)

          Show
          Hudson added a comment - Integrated in ZooKeeper-trunk #972 (See https://hudson.apache.org/hudson/job/ZooKeeper-trunk/972/ ) ZOOKEEPER-888 : c-client / zkpython: Double free corruption on node watcher (Austin Shoemaker via henryr)
          Hide
          Henry Robinson added a comment -

          I just committed this to origin/branch-3.3 and origin/trunk.

          Thanks both!

          Show
          Henry Robinson added a comment - I just committed this to origin/branch-3.3 and origin/trunk. Thanks both!
          Hide
          Austin Shoemaker added a comment -

          Patch based on the 3.3 branch attached (ZOOKEEPER-888-3.3.patch). Verified that unit tests pass with the changes, including the new watcher_test.

          Show
          Austin Shoemaker added a comment - Patch based on the 3.3 branch attached ( ZOOKEEPER-888 -3.3.patch). Verified that unit tests pass with the changes, including the new watcher_test.
          Hide
          Patrick Hunt added a comment -

          Can we backport the 853 change into 3.3, but not change the API? Big overhead or straightforward/simple?

          Show
          Patrick Hunt added a comment - Can we backport the 853 change into 3.3, but not change the API? Big overhead or straightforward/simple?
          Hide
          Henry Robinson added a comment -

          The patch as it stands relies on ZOOKEEPER-853 (which it fixes) which is not in 3.3 as it is a small API change - it changes is_unrecoverable to return Python True or False, rather than ZINVALIDSTATE.

          So I'm not certain about what to do here - we try not to change APIs between minor versions. However, this is a very minor change, and this patch fixes a significant bug. I'm inclined to commit both 853 and this patch to 3.3 as well as trunk, and put a note in the release notes.

          Any objections?

          Show
          Henry Robinson added a comment - The patch as it stands relies on ZOOKEEPER-853 (which it fixes) which is not in 3.3 as it is a small API change - it changes is_unrecoverable to return Python True or False, rather than ZINVALIDSTATE. So I'm not certain about what to do here - we try not to change APIs between minor versions. However, this is a very minor change, and this patch fixes a significant bug. I'm inclined to commit both 853 and this patch to 3.3 as well as trunk, and put a note in the release notes. Any objections?
          Hide
          Austin Shoemaker added a comment -

          Updated ZOOKEEPER-888.patch with the following changes:

          • Fixed zookeeper.is_unrecoverable to return the correct value, it was returning false in all cases.
          • Added watcher_test.py to cover the issue this patch fixes. Verified that it crashes before patching and succeeds afterward.
          Show
          Austin Shoemaker added a comment - Updated ZOOKEEPER-888 .patch with the following changes: Fixed zookeeper.is_unrecoverable to return the correct value, it was returning false in all cases. Added watcher_test.py to cover the issue this patch fixes. Verified that it crashes before patching and succeeds afterward.
          Hide
          Henry Robinson added a comment -

          The patch looks good to me - thanks!

          Could you add a test case that verifies the correct behaviour, if possible? (I appreciate it can be hard to fake unrecoverable session errors). We keep circling around the correct behaviour for this code block, and I'd like to capture it in a test suite.

          Show
          Henry Robinson added a comment - The patch looks good to me - thanks! Could you add a test case that verifies the correct behaviour, if possible? (I appreciate it can be hard to fake unrecoverable session errors). We keep circling around the correct behaviour for this code block, and I'd like to capture it in a test suite.
          Hide
          Austin Shoemaker added a comment -

          Improved patch attached. Before, watcher_dispatch would unconditionally free non-global watcher objects.

          Any number of recoverable session state change events may be sent to the watcher. This change frees the watcher only on the last callback- a data change event or unrecoverable session state change.

          Show
          Austin Shoemaker added a comment - Improved patch attached. Before, watcher_dispatch would unconditionally free non-global watcher objects. Any number of recoverable session state change events may be sent to the watcher. This change frees the watcher only on the last callback- a data change event or unrecoverable session state change.
          Hide
          Austin Shoemaker added a comment -

          Path that prevents freeing a watcher in response to a session event, per the feedback in ZOOKEEPER-890.

          Show
          Austin Shoemaker added a comment - Path that prevents freeing a watcher in response to a session event, per the feedback in ZOOKEEPER-890 .
          Hide
          Lukas added a comment -

          Seems to demonstrate the same issue.

          Show
          Lukas added a comment - Seems to demonstrate the same issue.
          Hide
          Patrick Hunt added a comment -

          Borderline blocker, Henry any insight on this? Something that can be addressed for 3.3.2?

          Show
          Patrick Hunt added a comment - Borderline blocker, Henry any insight on this? Something that can be addressed for 3.3.2?
          Hide
          Lukas added a comment -

          Example code for triggering the bug

          Show
          Lukas added a comment - Example code for triggering the bug

            People

            • Assignee:
              Lukas
              Reporter:
              Lukas
            • Votes:
              1 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development