ZooKeeper
  1. ZooKeeper
  2. ZOOKEEPER-1018

The connection permutation in get_addrs uses a weak and inefficient shuffle

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 3.3.2
    • Fix Version/s: 3.4.0
    • Component/s: c client
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      After determining all of the addresses in the get_addrs function in the C client, the connection is permuted using the following code:

      setup_random();
      /* Permute */
      for(i = 0; i < zh->addrs_count; i++) {
      struct sockaddr_storage *s1 = zh->addrs + random()%zh->addrs_count;
      struct sockaddr_storage *s2 = zh->addrs + random()%zh->addrs_count;
      if (s1 != s2)

      { struct sockaddr_storage t = *s1; *s1 = *s2; *s2 = t; }

      }

      Not only does this shuffle produce an uneven permutation, but it is half as efficient as the Fisher-Yates shuffle which produces an unbiased one. It seems like it would be a simple fix to increase the randomness and efficiency of the shuffle by switching over to using Fisher-Yates.

        Issue Links

          Activity

          Hide
          Stephen Tyree added a comment -

          Patch to implement the change. I'm not sure if I diff'd this correctly, but this is it.

          Show
          Stephen Tyree added a comment - Patch to implement the change. I'm not sure if I diff'd this correctly, but this is it.
          Hide
          Mahadev konar added a comment -

          thanks stephen, I have seen quite some instances where the connections are uneven for a lot of client connections. Would you be able to verify this change with lets say running 3 zookeeper servers and having a lot of connections (100's) and seeing how its affected?

          Also, can you please look at the java code and see if it faces the same issue?

          Show
          Mahadev konar added a comment - thanks stephen, I have seen quite some instances where the connections are uneven for a lot of client connections. Would you be able to verify this change with lets say running 3 zookeeper servers and having a lot of connections (100's) and seeing how its affected? Also, can you please look at the java code and see if it faces the same issue?
          Hide
          Stephen Tyree added a comment -

          Although I am less familiar with the Java code, I found this snippet in org/main/apache/zookeeper/client/StaticHostProvider.java:

          Collections.shuffle(this.serverAddresses);

          This tells me that the Java library lacks this problem. Do you want me to package this up more neatly with an svn diff?

          Show
          Stephen Tyree added a comment - Although I am less familiar with the Java code, I found this snippet in org/main/apache/zookeeper/client/StaticHostProvider.java: Collections.shuffle(this.serverAddresses); This tells me that the Java library lacks this problem. Do you want me to package this up more neatly with an svn diff?
          Hide
          Mahadev konar added a comment -

          stephen, please do.

          Show
          Mahadev konar added a comment - stephen, please do.
          Hide
          Stephen Tyree added a comment -

          Running 1000 times with the old way, I get the following distribution:

          308 The connected host is: (address 1):2181
          369 The connected host is: (address 2):2181
          324 The connected host is: (address 3):2181

          Running 1000 times with the new way, I get the following distribution:

          333 The connected host is: (address 1):2181
          325 The connected host is: (address 2):2181
          343 The connected host is: (address 3):2181

          Hardly scientific, but an indication that the new way is better or at the least as good.

          Show
          Stephen Tyree added a comment - Running 1000 times with the old way, I get the following distribution: 308 The connected host is: (address 1):2181 369 The connected host is: (address 2):2181 324 The connected host is: (address 3):2181 Running 1000 times with the new way, I get the following distribution: 333 The connected host is: (address 1):2181 325 The connected host is: (address 2):2181 343 The connected host is: (address 3):2181 Hardly scientific, but an indication that the new way is better or at the least as good.
          Hide
          Stephen Tyree added a comment -

          Patch to implement the enhancement.

          Show
          Stephen Tyree added a comment - Patch to implement the enhancement.
          Hide
          Benjamin Reed added a comment -

          +1 looks great stephen!

          Show
          Benjamin Reed added a comment - +1 looks great stephen!
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12473695/ZOOKEEPER-1018.patch
          against trunk revision 1081936.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          -1 patch. The patch command could not apply the patch.

          Console output: https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/190//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12473695/ZOOKEEPER-1018.patch against trunk revision 1081936. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/190//console This message is automatically generated.
          Hide
          Stephen Tyree added a comment -

          Is the problem that I didn't run 'svn diff' from the root? If so I'll do that and resubmit.

          Show
          Stephen Tyree added a comment - Is the problem that I didn't run 'svn diff' from the root? If so I'll do that and resubmit.
          Hide
          Stephen Tyree added a comment -

          Also, should I be adding tests? Or should I be listing what I did to test this?

          Show
          Stephen Tyree added a comment - Also, should I be adding tests? Or should I be listing what I did to test this?
          Hide
          Stephen Tyree added a comment -

          Updated, this time taking the diff from the root directory.

          Show
          Stephen Tyree added a comment - Updated, this time taking the diff from the root directory.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12473801/ZOOKEEPER-1018.patch
          against trunk revision 1081936.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/192//testReport/
          Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/192//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/192//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12473801/ZOOKEEPER-1018.patch against trunk revision 1081936. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/192//testReport/ Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/192//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/192//console This message is automatically generated.
          Hide
          Patrick Hunt added a comment -

          Seems like this issue could be causing the issue seen in ZOOKEEPER-989 (???)

          Show
          Patrick Hunt added a comment - Seems like this issue could be causing the issue seen in ZOOKEEPER-989 (???)
          Hide
          Patrick Hunt added a comment -

          Would be great to have new tests for this, not sure how to declare success/failure though as we don't want false failure indications... Any ideas?

          Show
          Patrick Hunt added a comment - Would be great to have new tests for this, not sure how to declare success/failure though as we don't want false failure indications... Any ideas?
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12473801/ZOOKEEPER-1018.patch
          against trunk revision 1082362.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/194//testReport/
          Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/194//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/194//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12473801/ZOOKEEPER-1018.patch against trunk revision 1082362. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/194//testReport/ Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/194//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/194//console This message is automatically generated.
          Hide
          Patrick Hunt added a comment -

          Cancelling patch - seems the existing tests are failing with this change, in both cases I see a failure on:

          [exec] [exec] /grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/src/c/tests/TestZookeeperInit.cc:297: Assertion: equality assertion failed [Expected: 3210, Actual : 2310]

          Probably good to address this, at the same time add some addl tests would be great.

          Show
          Patrick Hunt added a comment - Cancelling patch - seems the existing tests are failing with this change, in both cases I see a failure on: [exec] [exec] /grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/src/c/tests/TestZookeeperInit.cc:297: Assertion: equality assertion failed [Expected: 3210, Actual : 2310] Probably good to address this, at the same time add some addl tests would be great.
          Hide
          Stephen Tyree added a comment -

          The reason the unit tests failed was due to the change in how random numbers impacted the order of servers. The code for shuffling addresses is correct, and making everything work just required changing a few numbers in the Mock_random generator.

          Show
          Stephen Tyree added a comment - The reason the unit tests failed was due to the change in how random numbers impacted the order of servers. The code for shuffling addresses is correct, and making everything work just required changing a few numbers in the Mock_random generator.
          Hide
          Mahadev konar added a comment -

          +1 looks good.

          pat, since the test passes now and is fixed, do you think we need another test case or should this suffice?

          Show
          Mahadev konar added a comment - +1 looks good. pat, since the test passes now and is fixed, do you think we need another test case or should this suffice?
          Hide
          Stephen Tyree added a comment -

          What is the status on this? Is adjusting the unit test for shuffling sufficient or is there more I need to do here?

          Show
          Stephen Tyree added a comment - What is the status on this? Is adjusting the unit test for shuffling sufficient or is there more I need to do here?
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12474749/ZOOKEEPER-1018.patch
          against trunk revision 1087821.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/220//testReport/
          Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/220//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/220//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12474749/ZOOKEEPER-1018.patch against trunk revision 1087821. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/220//testReport/ Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/220//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/220//console This message is automatically generated.
          Hide
          Benjamin Reed added a comment -

          Committed revision 1088792.

          Show
          Benjamin Reed added a comment - Committed revision 1088792.
          Hide
          Benjamin Reed added a comment -

          thanx Stephen!

          Show
          Benjamin Reed added a comment - thanx Stephen!
          Hide
          Hudson added a comment -

          Integrated in ZooKeeper-trunk #1144 (See https://hudson.apache.org/hudson/job/ZooKeeper-trunk/1144/)
          ZOOKEEPER-1018. The connection permutation in get_addrs uses a weak and inefficient shuffle

          Show
          Hudson added a comment - Integrated in ZooKeeper-trunk #1144 (See https://hudson.apache.org/hudson/job/ZooKeeper-trunk/1144/ ) ZOOKEEPER-1018 . The connection permutation in get_addrs uses a weak and inefficient shuffle

            People

            • Assignee:
              Stephen Tyree
              Reporter:
              Stephen Tyree
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - 2h
                2h
                Remaining:
                Remaining Estimate - 2h
                2h
                Logged:
                Time Spent - Not Specified
                Not Specified

                  Development