Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 3.2.1
    • Fix Version/s: 3.3.0
    • Component/s: server
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      the server and connection "sent" stat is not being updated. if you run "stat" on the client port the sent packets is much lower than it should be

      seems that sendbuffer is not updating the stats when it shortcircuits the send.

      1. ZOOKEEPER-558.patch
        69 kB
        Patrick Hunt
      2. ZOOKEEPER-558.patch
        70 kB
        Patrick Hunt

        Issue Links

          Activity

          Hide
          Patrick Hunt added a comment -

          looks like this may be more serious, afaict some sendbuffer(closeconn) will (may) be ignored. not sure this was thought out when the short-circuit optimization was made. esp around four letter words.

          Show
          Patrick Hunt added a comment - looks like this may be more serious, afaict some sendbuffer(closeconn) will (may) be ignored. not sure this was thought out when the short-circuit optimization was made. esp around four letter words.
          Hide
          Patrick Hunt added a comment -

          This patch addresses the sent count issue (counter update was missing), additionally I addressed a number
          of session establishment/teardown issues that I found while fixing this:

          1) better logging of session est/teardown, much of this was based on user feedback such as ZOOKEEPER-565 ZOOKEEPER-564
          Much of this was cleaning up the log text and using common terminology for the messages themselves
          1.1) some of 1) required better error handling. A lot of the changed code was throwing general IOExceptions, rather than throwing
          more specific exceptions allowing higher layer code to properly report the issue.
          2) while closing a client connection the connection close response was not always sent to the client
          3) in some cases the server would not close the connection (closeconn buffer was not being queued) properly. this would happen anyway when the client closed the connection, but the server should do the right thing regardless

          4) calling close more than once just returns after the first call
          5) added testableWaitForShutdown to ZooKeeper to allow tests to validate that the client has shutdown all threads
          (this is test only)

          6) reading a client sockets now reads the entire request in one shot if possible. previously we would re-poll the
          selector after reading the length header.

          Show
          Patrick Hunt added a comment - This patch addresses the sent count issue (counter update was missing), additionally I addressed a number of session establishment/teardown issues that I found while fixing this: 1) better logging of session est/teardown, much of this was based on user feedback such as ZOOKEEPER-565 ZOOKEEPER-564 Much of this was cleaning up the log text and using common terminology for the messages themselves 1.1) some of 1) required better error handling. A lot of the changed code was throwing general IOExceptions, rather than throwing more specific exceptions allowing higher layer code to properly report the issue. 2) while closing a client connection the connection close response was not always sent to the client 3) in some cases the server would not close the connection (closeconn buffer was not being queued) properly. this would happen anyway when the client closed the connection, but the server should do the right thing regardless 4) calling close more than once just returns after the first call 5) added testableWaitForShutdown to ZooKeeper to allow tests to validate that the client has shutdown all threads (this is test only) 6) reading a client sockets now reads the entire request in one shot if possible. previously we would re-poll the selector after reading the length header.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12423863/ZOOKEEPER-558.patch
          against trunk revision 831486.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 15 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 1 new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h8.grid.sp2.yahoo.net/45/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h8.grid.sp2.yahoo.net/45/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h8.grid.sp2.yahoo.net/45/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12423863/ZOOKEEPER-558.patch against trunk revision 831486. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 15 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 1 new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h8.grid.sp2.yahoo.net/45/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h8.grid.sp2.yahoo.net/45/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h8.grid.sp2.yahoo.net/45/console This message is automatically generated.
          Hide
          Patrick Hunt added a comment -

          Same as last patch except cleared up the findbugs issue.

          Show
          Patrick Hunt added a comment - Same as last patch except cleared up the findbugs issue.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12423930/ZOOKEEPER-558.patch
          against trunk revision 831486.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 15 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h8.grid.sp2.yahoo.net/46/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h8.grid.sp2.yahoo.net/46/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h8.grid.sp2.yahoo.net/46/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12423930/ZOOKEEPER-558.patch against trunk revision 831486. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 15 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h8.grid.sp2.yahoo.net/46/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h8.grid.sp2.yahoo.net/46/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h8.grid.sp2.yahoo.net/46/console This message is automatically generated.
          Hide
          Mahadev konar added a comment -

          was looking at the patch.

          any reason to add

          private static final long serialVersionUID
          
          

          in the new exceptions? The exceptions are IOException and serializing and deserializing them as IOException serialversion should be fine. no?

          Show
          Mahadev konar added a comment - was looking at the patch. any reason to add private static final long serialVersionUID in the new exceptions? The exceptions are IOException and serializing and deserializing them as IOException serialversion should be fine. no?
          Hide
          Patrick Hunt added a comment -

          eclipse warns otw if you don't have the uid. no other reason

          Show
          Patrick Hunt added a comment - eclipse warns otw if you don't have the uid. no other reason
          Hide
          Henry Robinson added a comment -

          Big +1 on this patch from me; a bunch of very necessary refactorings as well the bug fix. Couple of tiny nits

          {{+ // this is ugly, you have a better way speak up
          + if (e instanceof SessionExpiredException)

          { + LOG.info(e.getMessage() + ", closing socket connection"); + }

          else if (e instanceof SessionTimeoutException)

          { + LOG.info(e.getMessage() + RETRY_CONN_MSG); + } else if (e instanceof EndOfStreamException) { + LOG.info(e.getMessage() + RETRY_CONN_MSG); + }

          else

          { + LOG.warn("Session 0x" + + Long.toHexString(getSessionId()) + + " for server " + + ((SocketChannel)sockKey.channel()) + .socket().getRemoteSocketAddress() + + ", unexpected error" + + RETRY_CONN_MSG, + e); + }

          }}

          Is the better way to have catch clauses for each exception type?.

          {{+ try

          { + sock.write(bb); + }

          catch (IOException e)

          { + // we are just doing best effort right now + }

          }}

          Debug message here?

          Show
          Henry Robinson added a comment - Big +1 on this patch from me; a bunch of very necessary refactorings as well the bug fix. Couple of tiny nits {{+ // this is ugly, you have a better way speak up + if (e instanceof SessionExpiredException) { + LOG.info(e.getMessage() + ", closing socket connection"); + } else if (e instanceof SessionTimeoutException) { + LOG.info(e.getMessage() + RETRY_CONN_MSG); + } else if (e instanceof EndOfStreamException) { + LOG.info(e.getMessage() + RETRY_CONN_MSG); + } else { + LOG.warn("Session 0x" + + Long.toHexString(getSessionId()) + + " for server " + + ((SocketChannel)sockKey.channel()) + .socket().getRemoteSocketAddress() + + ", unexpected error" + + RETRY_CONN_MSG, + e); + } }} Is the better way to have catch clauses for each exception type?. {{+ try { + sock.write(bb); + } catch (IOException e) { + // we are just doing best effort right now + } }} Debug message here?
          Hide
          Patrick Hunt added a comment -

          The idea with catching Exception (current code before/after patch) is to warn if
          something unexpected happens (esp so that we'll know about
          it and have a chance to fix). Except in the case where we are closing the
          connection. All kinds of weird errors could be thrown (not just by write afaict)
          and we don't want to bother the user with that, at least not at warn level during close.
          I don't think there's one or two of these potential sites, afaict that's why the code
          had this originally (a big net).

          That's the theory at least behind the refactorings in the patch...

          I'd like this to go through unless you have something concrete in mind - I'd be happy
          to explore further as additional JIRAs for example. (if you feel strongly to wait
          then we could do that too)

          Show
          Patrick Hunt added a comment - The idea with catching Exception (current code before/after patch) is to warn if something unexpected happens (esp so that we'll know about it and have a chance to fix). Except in the case where we are closing the connection. All kinds of weird errors could be thrown (not just by write afaict) and we don't want to bother the user with that, at least not at warn level during close. I don't think there's one or two of these potential sites, afaict that's why the code had this originally (a big net). That's the theory at least behind the refactorings in the patch... I'd like this to go through unless you have something concrete in mind - I'd be happy to explore further as additional JIRAs for example. (if you feel strongly to wait then we could do that too)
          Hide
          Benjamin Reed added a comment -

          Committed revision 833620.

          Show
          Benjamin Reed added a comment - Committed revision 833620.
          Hide
          Benjamin Reed added a comment -

          i agree, breaking down the reasons behind the IOException is great! sorry i didn't see your comment before committing henry. i couldn't think of a nicer way to do the if conditions, but it would be cool to come up with one.

          Show
          Benjamin Reed added a comment - i agree, breaking down the reasons behind the IOException is great! sorry i didn't see your comment before committing henry. i couldn't think of a nicer way to do the if conditions, but it would be cool to come up with one.
          Hide
          Henry Robinson added a comment -

          No problem - like I say those were tiny issues and shouldn't have stopped the patch being committed.

          I understand about catching Exception; I was just wondering why we couldn't do

          catch (IOException e)

          { ... }
          catch (SessionExpireException e) { ... }

          catch (Exception e)

          { ... }

          rather than the explicit instanceof checks? I feel like I'm probably missing something subtle here, and it's moot anyhow since the patch went in

          Show
          Henry Robinson added a comment - No problem - like I say those were tiny issues and shouldn't have stopped the patch being committed. I understand about catching Exception; I was just wondering why we couldn't do catch (IOException e) { ... } catch (SessionExpireException e) { ... } catch (Exception e) { ... } rather than the explicit instanceof checks? I feel like I'm probably missing something subtle here, and it's moot anyhow since the patch went in
          Hide
          Patrick Hunt added a comment -

          I'm probably snow blind from looking at it too long.

          Henry, easiest way for you to explain it is to create a jira and attach a patch. This is just
          a small step, still lots of opportunity for improving the user experience in particular.

          Show
          Patrick Hunt added a comment - I'm probably snow blind from looking at it too long. Henry, easiest way for you to explain it is to create a jira and attach a patch. This is just a small step, still lots of opportunity for improving the user experience in particular.
          Hide
          Hudson added a comment -

          Integrated in ZooKeeper-trunk #522 (See http://hudson.zones.apache.org/hudson/job/ZooKeeper-trunk/522/)

          Show
          Hudson added a comment - Integrated in ZooKeeper-trunk #522 (See http://hudson.zones.apache.org/hudson/job/ZooKeeper-trunk/522/ )

            People

            • Assignee:
              Patrick Hunt
              Reporter:
              Patrick Hunt
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development