ZooKeeper
  1. ZooKeeper
  2. ZOOKEEPER-1051

SIGPIPE in Zookeeper 0.3.* when send'ing after cluster disconnection

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 3.3.2, 3.3.3, 3.4.0
    • Fix Version/s: 3.4.0
    • Component/s: c client
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Add flag to socket send on Linux that prevents SIGPIPE from being fired should the Zookeeper cluster close the connection on its side.

      Description

      In libzookeeper_mt, if your process is going rather slowly (such as when running it in Valgrind's Memcheck) or you are using gdb with breakpoints, you can occasionally get SIGPIPE when trying to send a message to the cluster. For example:

      ==12788==
      ==12788== Process terminating with default action of signal 13 (SIGPIPE)
      ==12788== at 0x3F5180DE91: send (in /lib64/libpthread-2.5.so)
      ==12788== by 0x7F060AA: ??? (in /usr/lib64/libzookeeper_mt.so.2.0.0)
      ==12788== by 0x7F06E5B: zookeeper_process (in /usr/lib64/libzookeeper_mt.so.2.0.0)
      ==12788== by 0x7F0D38E: ??? (in /usr/lib64/libzookeeper_mt.so.2.0.0)
      ==12788== by 0x3F5180673C: start_thread (in /lib64/libpthread-2.5.so)
      ==12788== by 0x3F50CD3F6C: clone (in /lib64/libc-2.5.so)
      ==12788==

      This is probably not the behavior we would like, since we handle server disconnections after a failed call to send. To fix this, there are a few options we could use. For BSD environments, we can tell a socket to never send SIGPIPE with send using setsockopt:

      setsockopt(sd, SOL_SOCKET, SO_NOSIGPIPE, (void *)&set, sizeof(int));

      For Linux environments, we can add a MSG_NOSIGNAL flag to every send call that says to not send SIGPIPE on a bad file descriptor.

      For more information, see: http://stackoverflow.com/questions/108183/how-to-prevent-sigpipes-or-handle-them-properly

      1. ZOOKEEPER-1051.patch
        2 kB
        Stephen Tyree
      2. ZOOKEEPER-1051.patch
        0.6 kB
        Stephen Tyree

        Activity

        Hide
        Hudson added a comment -

        Integrated in ZooKeeper-trunk #1288 (See https://builds.apache.org/job/ZooKeeper-trunk/1288/)
        ZOOKEEPER-1051. SIGPIPE in Zookeeper 0.3.* when send'ing after cluster disconnection (Stephen Tyree via mahadev)

        mahadev : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1163106
        Files :

        • /zookeeper/trunk/CHANGES.txt
        • /zookeeper/trunk/src/c/src/zookeeper.c
        Show
        Hudson added a comment - Integrated in ZooKeeper-trunk #1288 (See https://builds.apache.org/job/ZooKeeper-trunk/1288/ ) ZOOKEEPER-1051 . SIGPIPE in Zookeeper 0.3.* when send'ing after cluster disconnection (Stephen Tyree via mahadev) mahadev : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1163106 Files : /zookeeper/trunk/CHANGES.txt /zookeeper/trunk/src/c/src/zookeeper.c
        Hide
        Mahadev konar added a comment -

        Just committed this. Thanks Stephen!

        Show
        Mahadev konar added a comment - Just committed this. Thanks Stephen!
        Hide
        Mahadev konar added a comment -

        I think this should go in to 3.4.0

        Show
        Mahadev konar added a comment - I think this should go in to 3.4.0
        Hide
        Stephen Tyree added a comment -

        There isn't an equivalent issue on Windows, only for POSIX-ish platforms that signal on failed socket sends. This means other Unix-like platforms could experience similar issues, but it appears in this case that for now we aren't going to concern ourselves with them.

        Is there anything left to do here or are we good for submission?

        Show
        Stephen Tyree added a comment - There isn't an equivalent issue on Windows, only for POSIX-ish platforms that signal on failed socket sends. This means other Unix-like platforms could experience similar issues, but it appears in this case that for now we aren't going to concern ourselves with them. Is there anything left to do here or are we good for submission?
        Hide
        Dheeraj Agrawal added a comment -

        +1 from windows standpoint. I couldn't find the windows equivalent of this flag in send() or setsocketopt. And i see that we are carrying the default behavior for windows, which looks good to me.

        Show
        Dheeraj Agrawal added a comment - +1 from windows standpoint. I couldn't find the windows equivalent of this flag in send() or setsocketopt. And i see that we are carrying the default behavior for windows, which looks good to me.
        Hide
        Benjamin Reed added a comment -

        +1 looks good to me

        Show
        Benjamin Reed added a comment - +1 looks good to me
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12491378/ZOOKEEPER-1051.patch
        against trunk revision 1159929.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/473//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/473//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/473//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12491378/ZOOKEEPER-1051.patch against trunk revision 1159929. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/473//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/473//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/473//console This message is automatically generated.
        Hide
        Stephen Tyree added a comment -

        Realized I missed a couple of spots, so I generalized the logic in a function. Haven't the ability to test the change on Windows.

        Show
        Stephen Tyree added a comment - Realized I missed a couple of spots, so I generalized the logic in a function. Haven't the ability to test the change on Windows.
        Hide
        Stephen Tyree added a comment -

        Patch which fixes the issue for Linux and Linux alone. Do we support Zookeeper for other *nixes? If so, there are likely to be platform specific solutions to the problem for them as well.

        Show
        Stephen Tyree added a comment - Patch which fixes the issue for Linux and Linux alone. Do we support Zookeeper for other *nixes? If so, there are likely to be platform specific solutions to the problem for them as well.

          People

          • Assignee:
            Stephen Tyree
            Reporter:
            Stephen Tyree
          • Votes:
            1 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - 2h
              2h
              Remaining:
              Remaining Estimate - 2h
              2h
              Logged:
              Time Spent - Not Specified
              Not Specified

                Development