Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.0.0-alpha1
    • Component/s: None
    • Labels:
      None

      Description

      rehash value returned by Object.hashCode() to get better distribution

      1. rehash1.txt
        2 kB
        Radim Kolar
      2. rehash2.txt
        2 kB
        Radim Kolar
      3. rehash3.txt
        2 kB
        Radim Kolar
      4. rehash4.txt
        6 kB
        Radim Kolar

        Activity

        Hide
        cutting Doug Cutting added a comment -

        This looks like a good addition. The javadoc might provide more detail, e.g., that a smoother partitioning may improve reduce time in some cases and should harm things in no cases, that this is suggested with Integer and Long keys with simple patterns in their distributions.

        Show
        cutting Doug Cutting added a comment - This looks like a good addition. The javadoc might provide more detail, e.g., that a smoother partitioning may improve reduce time in some cases and should harm things in no cases, that this is suggested with Integer and Long keys with simple patterns in their distributions.
        Hide
        hadoopqa Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12561407/rehash2.txt
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3130//testReport/
        Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3130//console

        This message is automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12561407/rehash2.txt against trunk revision . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3130//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3130//console This message is automatically generated.
        Hide
        cutting Doug Cutting added a comment -

        This patch needs some unit tests.

        Show
        cutting Doug Cutting added a comment - This patch needs some unit tests.
        Hide
        hsn Radim Kolar added a comment -

        HashPartitioner do not have unit tests either.

        Show
        hsn Radim Kolar added a comment - HashPartitioner do not have unit tests either.
        Hide
        hsn Radim Kolar added a comment -

        fixed javadoc comment.

        Show
        hsn Radim Kolar added a comment - fixed javadoc comment.
        Hide
        hsn Radim Kolar added a comment -

        Very smooth distribution for pattern. If you were not defending people depending on undocumented behavior, you would make it default.

        Dumping buckets distribution: min=902 avg=1043 max=1184
        bucket 0 964 items, variance -0.07574304889741132
        bucket 1 1042 items, variance -9.587727708533077E-4
        bucket 2 1101 items, variance 0.05560882070949185
        bucket 3 1039 items, variance -0.003835091083413231
        bucket 4 1099 items, variance 0.053691275167785234
        bucket 5 1044 items, variance 9.587727708533077E-4
        bucket 6 998 items, variance -0.04314477468839885
        bucket 7 1040 items, variance -0.0028763183125599234
        bucket 8 1184 items, variance 0.13518696069031638
        bucket 9 976 items, variance -0.06423777564717162
        bucket 10 902 items, variance -0.13518696069031638
        bucket 11 1124 items, variance 0.07766059443911794
        bucket 12 931 items, variance -0.10738255033557047
        bucket 13 1094 items, variance 0.0488974113135187
        bucket 14 1152 items, variance 0.10450623202301054
        bucket 15 977 items, variance -0.06327900287631831
        bucket 16 1057 items, variance 0.013422818791946308
        bucket 17 1048 items, variance 0.004793863854266539
        bucket 18 1052 items, variance 0.00862895493767977
        bucket 19 1042 items, variance -9.587727708533077E-4
        bucket 20 1028 items, variance -0.014381591562799617
        bucket 21 1038 items, variance -0.004793863854266539
        bucket 22 1037 items, variance -0.005752636625119847
        bucket 23 1040 items, variance -0.0028763183125599234
        bucket 24 1084 items, variance 0.039309683604985615
        bucket 25 974 items, variance -0.06615532118887824
        bucket 26 954 items, variance -0.08533077660594439
        bucket 27 1122 items, variance 0.07574304889741132
        bucket 28 1009 items, variance -0.032598274209012464
        bucket 29 1095 items, variance 0.04985618408437201
        bucket 30 1109 items, variance 0.06327900287631831
        bucket 31 978 items, variance -0.062320230105465
        0 of 32 are too small or large buckets

        Show
        hsn Radim Kolar added a comment - Very smooth distribution for pattern. If you were not defending people depending on undocumented behavior, you would make it default. Dumping buckets distribution: min=902 avg=1043 max=1184 bucket 0 964 items, variance -0.07574304889741132 bucket 1 1042 items, variance -9.587727708533077E-4 bucket 2 1101 items, variance 0.05560882070949185 bucket 3 1039 items, variance -0.003835091083413231 bucket 4 1099 items, variance 0.053691275167785234 bucket 5 1044 items, variance 9.587727708533077E-4 bucket 6 998 items, variance -0.04314477468839885 bucket 7 1040 items, variance -0.0028763183125599234 bucket 8 1184 items, variance 0.13518696069031638 bucket 9 976 items, variance -0.06423777564717162 bucket 10 902 items, variance -0.13518696069031638 bucket 11 1124 items, variance 0.07766059443911794 bucket 12 931 items, variance -0.10738255033557047 bucket 13 1094 items, variance 0.0488974113135187 bucket 14 1152 items, variance 0.10450623202301054 bucket 15 977 items, variance -0.06327900287631831 bucket 16 1057 items, variance 0.013422818791946308 bucket 17 1048 items, variance 0.004793863854266539 bucket 18 1052 items, variance 0.00862895493767977 bucket 19 1042 items, variance -9.587727708533077E-4 bucket 20 1028 items, variance -0.014381591562799617 bucket 21 1038 items, variance -0.004793863854266539 bucket 22 1037 items, variance -0.005752636625119847 bucket 23 1040 items, variance -0.0028763183125599234 bucket 24 1084 items, variance 0.039309683604985615 bucket 25 974 items, variance -0.06615532118887824 bucket 26 954 items, variance -0.08533077660594439 bucket 27 1122 items, variance 0.07574304889741132 bucket 28 1009 items, variance -0.032598274209012464 bucket 29 1095 items, variance 0.04985618408437201 bucket 30 1109 items, variance 0.06327900287631831 bucket 31 978 items, variance -0.062320230105465 0 of 32 are too small or large buckets
        Hide
        hsn Radim Kolar added a comment -

        unit test added - test if hash function returns smooth distribution for pattern input.

        Show
        hsn Radim Kolar added a comment - unit test added - test if hash function returns smooth distribution for pattern input.
        Hide
        hadoopqa Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12561627/rehash4.txt
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 1 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3139//testReport/
        Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3139//console

        This message is automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12561627/rehash4.txt against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3139//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3139//console This message is automatically generated.
        Hide
        cutting Doug Cutting added a comment -

        I committed this.

        Show
        cutting Doug Cutting added a comment - I committed this.
        Hide
        hudson Hudson added a comment -

        Integrated in Hadoop-trunk-Commit #3143 (See https://builds.apache.org/job/Hadoop-trunk-Commit/3143/)
        MAPREDUCE-4887. Add RehashPartitioner, to smooth distributions with poor implementations of Object#hashCode(). Contributed by Radim Kolar. (Revision 1424158)

        Result = SUCCESS
        cutting : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1424158
        Files :

        • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/partition/RehashPartitioner.java
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/partition
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/partition/TestRehashPartitioner.java
        Show
        hudson Hudson added a comment - Integrated in Hadoop-trunk-Commit #3143 (See https://builds.apache.org/job/Hadoop-trunk-Commit/3143/ ) MAPREDUCE-4887 . Add RehashPartitioner, to smooth distributions with poor implementations of Object#hashCode(). Contributed by Radim Kolar. (Revision 1424158) Result = SUCCESS cutting : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1424158 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/partition/RehashPartitioner.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/partition /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/partition/TestRehashPartitioner.java
        Hide
        hudson Hudson added a comment -

        Integrated in Hadoop-Yarn-trunk #71 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/71/)
        MAPREDUCE-4887. Add RehashPartitioner, to smooth distributions with poor implementations of Object#hashCode(). Contributed by Radim Kolar. (Revision 1424158)

        Result = SUCCESS
        cutting : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1424158
        Files :

        • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/partition/RehashPartitioner.java
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/partition
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/partition/TestRehashPartitioner.java
        Show
        hudson Hudson added a comment - Integrated in Hadoop-Yarn-trunk #71 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/71/ ) MAPREDUCE-4887 . Add RehashPartitioner, to smooth distributions with poor implementations of Object#hashCode(). Contributed by Radim Kolar. (Revision 1424158) Result = SUCCESS cutting : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1424158 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/partition/RehashPartitioner.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/partition /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/partition/TestRehashPartitioner.java
        Hide
        hudson Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk #1260 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1260/)
        MAPREDUCE-4887. Add RehashPartitioner, to smooth distributions with poor implementations of Object#hashCode(). Contributed by Radim Kolar. (Revision 1424158)

        Result = FAILURE
        cutting : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1424158
        Files :

        • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/partition/RehashPartitioner.java
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/partition
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/partition/TestRehashPartitioner.java
        Show
        hudson Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #1260 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1260/ ) MAPREDUCE-4887 . Add RehashPartitioner, to smooth distributions with poor implementations of Object#hashCode(). Contributed by Radim Kolar. (Revision 1424158) Result = FAILURE cutting : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1424158 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/partition/RehashPartitioner.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/partition /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/partition/TestRehashPartitioner.java
        Hide
        hudson Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk #1291 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1291/)
        MAPREDUCE-4887. Add RehashPartitioner, to smooth distributions with poor implementations of Object#hashCode(). Contributed by Radim Kolar. (Revision 1424158)

        Result = SUCCESS
        cutting : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1424158
        Files :

        • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/partition/RehashPartitioner.java
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/partition
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/partition/TestRehashPartitioner.java
        Show
        hudson Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #1291 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1291/ ) MAPREDUCE-4887 . Add RehashPartitioner, to smooth distributions with poor implementations of Object#hashCode(). Contributed by Radim Kolar. (Revision 1424158) Result = SUCCESS cutting : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1424158 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/partition/RehashPartitioner.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/partition /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/partition/TestRehashPartitioner.java

          People

          • Assignee:
            hsn Radim Kolar
            Reporter:
            hsn Radim Kolar
          • Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development