Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-297

Improve hashCode implementations for PB records

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.1.0-beta
    • Component/s: None
    • Labels:
      None

      Description

      As Radim Kolar pointed out in YARN-2, we use very small primes in all our hashCode implementations.

      1. YARN.297.1.patch
        6 kB
        Xuan Gong
      2. YARN-297.2.patch
        5 kB
        Xuan Gong

        Activity

        Hide
        hsn Radim Kolar added a comment -

        You also use initial (seed) value 1 or -1. Best is to use different prime per class.

        Show
        hsn Radim Kolar added a comment - You also use initial (seed) value 1 or -1. Best is to use different prime per class.
        Hide
        xgong Xuan Gong added a comment -

        So, those java files use small primes,31, and seed value as 1 to generate the hash value:
        ./hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationAttemptId.java
        ./hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationId.java
        ./hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ContainerId.java
        ./hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/NodeId.java
        ./hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/Priority.java
        ./hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/Resource.java
        ./hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ResourceRequest.java

        Show
        xgong Xuan Gong added a comment - So, those java files use small primes,31, and seed value as 1 to generate the hash value: ./hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationAttemptId.java ./hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationId.java ./hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ContainerId.java ./hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/NodeId.java ./hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/Priority.java ./hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/Resource.java ./hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ResourceRequest.java
        Hide
        xgong Xuan Gong added a comment -

        Radim, any recommendation on the larger prime number we can use ? And maybe we can set the seed value a little bit larger, use larger prime value instead of 1 ?

        Show
        xgong Xuan Gong added a comment - Radim, any recommendation on the larger prime number we can use ? And maybe we can set the seed value a little bit larger, use larger prime value instead of 1 ?
        Hide
        hsn Radim Kolar added a comment -

        I use 4 number primes generated at http://www.numberempire.com/primenumbers.php Every class should have different initial value and constant.

        Type some random number there and form will find you nearest prime. For every class needing hashCode() obtain 2 primes.

        Show
        hsn Radim Kolar added a comment - I use 4 number primes generated at http://www.numberempire.com/primenumbers.php Every class should have different initial value and constant. Type some random number there and form will find you nearest prime. For every class needing hashCode() obtain 2 primes.
        Hide
        xgong Xuan Gong added a comment -

        Thanks. Could you kind educate me why we prefer to use bigger prime number in the hashCode function ?

        Show
        xgong Xuan Gong added a comment - Thanks. Could you kind educate me why we prefer to use bigger prime number in the hashCode function ?
        Hide
        xgong Xuan Gong added a comment -

        Thanks. Radim, Could you kind educate me why we prefer to use bigger prime number in the hashCode function ?

        Show
        xgong Xuan Gong added a comment - Thanks. Radim, Could you kind educate me why we prefer to use bigger prime number in the hashCode function ?
        Hide
        hsn Radim Kolar added a comment -

        bigger numbers will give hashCodes larger spread and it will less likely to generate collision if you sum hashCodes.

        look at source code for java.lang.ThreadLocal

        Show
        hsn Radim Kolar added a comment - bigger numbers will give hashCodes larger spread and it will less likely to generate collision if you sum hashCodes. look at source code for java.lang.ThreadLocal
        Hide
        hadoopqa Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12563010/YARN.297.1.patch
        against trunk revision .

        -1 patch. The patch command could not apply the patch.

        Console output: https://builds.apache.org/job/PreCommit-YARN-Build/369//console

        This message is automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12563010/YARN.297.1.patch against trunk revision . -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/369//console This message is automatically generated.
        Hide
        hitesh Hitesh Shah added a comment -

        @Xuan, please rebase your patch and upload it again as the previous one failed to get past Jenkins. Thanks.

        Show
        hitesh Hitesh Shah added a comment - @Xuan, please rebase your patch and upload it again as the previous one failed to get past Jenkins. Thanks.
        Hide
        xgong Xuan Gong added a comment -

        Recreate the patch based on the latest trunk version.
        Randomly choose the big prime integer for those hashCode() functions.

        Show
        xgong Xuan Gong added a comment - Recreate the patch based on the latest trunk version. Randomly choose the big prime integer for those hashCode() functions.
        Hide
        hadoopqa Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12574269/YARN-297.2.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-YARN-Build/537//testReport/
        Console output: https://builds.apache.org/job/PreCommit-YARN-Build/537//console

        This message is automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12574269/YARN-297.2.patch against trunk revision . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/537//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/537//console This message is automatically generated.
        Hide
        xgong Xuan Gong added a comment -

        Use a very big prime integer number for hashcode() function in order to decrease collision.
        Not need to add tests.

        Show
        xgong Xuan Gong added a comment - Use a very big prime integer number for hashcode() function in order to decrease collision. Not need to add tests.
        Hide
        hitesh Hitesh Shah added a comment -

        +1. Will commit shortly.

        Show
        hitesh Hitesh Shah added a comment - +1. Will commit shortly.
        Hide
        hitesh Hitesh Shah added a comment -

        Committed to branch-2 and trunk.

        Show
        hitesh Hitesh Shah added a comment - Committed to branch-2 and trunk.
        Hide
        hudson Hudson added a comment -

        Integrated in Hadoop-trunk-Commit #3498 (See https://builds.apache.org/job/Hadoop-trunk-Commit/3498/)
        YARN-297. Improve hashCode implementations for PB records. Contributed by Xuan Gong. (Revision 1459054)

        Result = SUCCESS
        hitesh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1459054
        Files :

        • /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationAttemptId.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationId.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ContainerId.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/NodeId.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/Priority.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ResourceRequest.java
        Show
        hudson Hudson added a comment - Integrated in Hadoop-trunk-Commit #3498 (See https://builds.apache.org/job/Hadoop-trunk-Commit/3498/ ) YARN-297 . Improve hashCode implementations for PB records. Contributed by Xuan Gong. (Revision 1459054) Result = SUCCESS hitesh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1459054 Files : /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationAttemptId.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationId.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ContainerId.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/NodeId.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/Priority.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ResourceRequest.java
        Hide
        hudson Hudson added a comment -

        Integrated in Hadoop-Yarn-trunk #162 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/162/)
        YARN-297. Improve hashCode implementations for PB records. Contributed by Xuan Gong. (Revision 1459054)

        Result = SUCCESS
        hitesh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1459054
        Files :

        • /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationAttemptId.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationId.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ContainerId.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/NodeId.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/Priority.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ResourceRequest.java
        Show
        hudson Hudson added a comment - Integrated in Hadoop-Yarn-trunk #162 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/162/ ) YARN-297 . Improve hashCode implementations for PB records. Contributed by Xuan Gong. (Revision 1459054) Result = SUCCESS hitesh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1459054 Files : /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationAttemptId.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationId.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ContainerId.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/NodeId.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/Priority.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ResourceRequest.java
        Hide
        hudson Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk #1351 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1351/)
        YARN-297. Improve hashCode implementations for PB records. Contributed by Xuan Gong. (Revision 1459054)

        Result = FAILURE
        hitesh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1459054
        Files :

        • /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationAttemptId.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationId.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ContainerId.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/NodeId.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/Priority.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ResourceRequest.java
        Show
        hudson Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #1351 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1351/ ) YARN-297 . Improve hashCode implementations for PB records. Contributed by Xuan Gong. (Revision 1459054) Result = FAILURE hitesh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1459054 Files : /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationAttemptId.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationId.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ContainerId.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/NodeId.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/Priority.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ResourceRequest.java
        Hide
        hudson Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk #1379 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1379/)
        YARN-297. Improve hashCode implementations for PB records. Contributed by Xuan Gong. (Revision 1459054)

        Result = SUCCESS
        hitesh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1459054
        Files :

        • /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationAttemptId.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationId.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ContainerId.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/NodeId.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/Priority.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ResourceRequest.java
        Show
        hudson Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #1379 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1379/ ) YARN-297 . Improve hashCode implementations for PB records. Contributed by Xuan Gong. (Revision 1459054) Result = SUCCESS hitesh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1459054 Files : /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationAttemptId.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationId.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ContainerId.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/NodeId.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/Priority.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ResourceRequest.java

          People

          • Assignee:
            xgong Xuan Gong
            Reporter:
            acmurthy Arun C Murthy
          • Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development