Hadoop YARN
  1. Hadoop YARN
  2. YARN-297

Improve hashCode implementations for PB records

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.1.0-beta
    • Component/s: None
    • Labels:
      None

      Description

      As Radim Kolar pointed out in YARN-2, we use very small primes in all our hashCode implementations.

      1. YARN.297.1.patch
        6 kB
        Xuan Gong
      2. YARN-297.2.patch
        5 kB
        Xuan Gong

        Activity

        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk #1379 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1379/)
        YARN-297. Improve hashCode implementations for PB records. Contributed by Xuan Gong. (Revision 1459054)

        Result = SUCCESS
        hitesh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1459054
        Files :

        • /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationAttemptId.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationId.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ContainerId.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/NodeId.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/Priority.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ResourceRequest.java
        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #1379 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1379/ ) YARN-297 . Improve hashCode implementations for PB records. Contributed by Xuan Gong. (Revision 1459054) Result = SUCCESS hitesh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1459054 Files : /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationAttemptId.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationId.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ContainerId.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/NodeId.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/Priority.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ResourceRequest.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk #1351 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1351/)
        YARN-297. Improve hashCode implementations for PB records. Contributed by Xuan Gong. (Revision 1459054)

        Result = FAILURE
        hitesh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1459054
        Files :

        • /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationAttemptId.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationId.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ContainerId.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/NodeId.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/Priority.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ResourceRequest.java
        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #1351 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1351/ ) YARN-297 . Improve hashCode implementations for PB records. Contributed by Xuan Gong. (Revision 1459054) Result = FAILURE hitesh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1459054 Files : /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationAttemptId.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationId.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ContainerId.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/NodeId.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/Priority.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ResourceRequest.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Yarn-trunk #162 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/162/)
        YARN-297. Improve hashCode implementations for PB records. Contributed by Xuan Gong. (Revision 1459054)

        Result = SUCCESS
        hitesh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1459054
        Files :

        • /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationAttemptId.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationId.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ContainerId.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/NodeId.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/Priority.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ResourceRequest.java
        Show
        Hudson added a comment - Integrated in Hadoop-Yarn-trunk #162 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/162/ ) YARN-297 . Improve hashCode implementations for PB records. Contributed by Xuan Gong. (Revision 1459054) Result = SUCCESS hitesh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1459054 Files : /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationAttemptId.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationId.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ContainerId.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/NodeId.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/Priority.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ResourceRequest.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-trunk-Commit #3498 (See https://builds.apache.org/job/Hadoop-trunk-Commit/3498/)
        YARN-297. Improve hashCode implementations for PB records. Contributed by Xuan Gong. (Revision 1459054)

        Result = SUCCESS
        hitesh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1459054
        Files :

        • /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationAttemptId.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationId.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ContainerId.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/NodeId.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/Priority.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ResourceRequest.java
        Show
        Hudson added a comment - Integrated in Hadoop-trunk-Commit #3498 (See https://builds.apache.org/job/Hadoop-trunk-Commit/3498/ ) YARN-297 . Improve hashCode implementations for PB records. Contributed by Xuan Gong. (Revision 1459054) Result = SUCCESS hitesh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1459054 Files : /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationAttemptId.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationId.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ContainerId.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/NodeId.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/Priority.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ResourceRequest.java
        Hide
        Hitesh Shah added a comment -

        Committed to branch-2 and trunk.

        Show
        Hitesh Shah added a comment - Committed to branch-2 and trunk.
        Hide
        Hitesh Shah added a comment -

        +1. Will commit shortly.

        Show
        Hitesh Shah added a comment - +1. Will commit shortly.
        Hide
        Xuan Gong added a comment -

        Use a very big prime integer number for hashcode() function in order to decrease collision.
        Not need to add tests.

        Show
        Xuan Gong added a comment - Use a very big prime integer number for hashcode() function in order to decrease collision. Not need to add tests.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12574269/YARN-297.2.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-YARN-Build/537//testReport/
        Console output: https://builds.apache.org/job/PreCommit-YARN-Build/537//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12574269/YARN-297.2.patch against trunk revision . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/537//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/537//console This message is automatically generated.
        Hide
        Xuan Gong added a comment -

        Recreate the patch based on the latest trunk version.
        Randomly choose the big prime integer for those hashCode() functions.

        Show
        Xuan Gong added a comment - Recreate the patch based on the latest trunk version. Randomly choose the big prime integer for those hashCode() functions.
        Hide
        Hitesh Shah added a comment -

        @Xuan, please rebase your patch and upload it again as the previous one failed to get past Jenkins. Thanks.

        Show
        Hitesh Shah added a comment - @Xuan, please rebase your patch and upload it again as the previous one failed to get past Jenkins. Thanks.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12563010/YARN.297.1.patch
        against trunk revision .

        -1 patch. The patch command could not apply the patch.

        Console output: https://builds.apache.org/job/PreCommit-YARN-Build/369//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12563010/YARN.297.1.patch against trunk revision . -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/369//console This message is automatically generated.
        Hide
        Radim Kolar added a comment -

        bigger numbers will give hashCodes larger spread and it will less likely to generate collision if you sum hashCodes.

        look at source code for java.lang.ThreadLocal

        Show
        Radim Kolar added a comment - bigger numbers will give hashCodes larger spread and it will less likely to generate collision if you sum hashCodes. look at source code for java.lang.ThreadLocal
        Hide
        Xuan Gong added a comment -

        Thanks. Radim, Could you kind educate me why we prefer to use bigger prime number in the hashCode function ?

        Show
        Xuan Gong added a comment - Thanks. Radim, Could you kind educate me why we prefer to use bigger prime number in the hashCode function ?
        Hide
        Xuan Gong added a comment -

        Thanks. Could you kind educate me why we prefer to use bigger prime number in the hashCode function ?

        Show
        Xuan Gong added a comment - Thanks. Could you kind educate me why we prefer to use bigger prime number in the hashCode function ?
        Hide
        Radim Kolar added a comment -

        I use 4 number primes generated at http://www.numberempire.com/primenumbers.php Every class should have different initial value and constant.

        Type some random number there and form will find you nearest prime. For every class needing hashCode() obtain 2 primes.

        Show
        Radim Kolar added a comment - I use 4 number primes generated at http://www.numberempire.com/primenumbers.php Every class should have different initial value and constant. Type some random number there and form will find you nearest prime. For every class needing hashCode() obtain 2 primes.
        Hide
        Xuan Gong added a comment -

        Radim, any recommendation on the larger prime number we can use ? And maybe we can set the seed value a little bit larger, use larger prime value instead of 1 ?

        Show
        Xuan Gong added a comment - Radim, any recommendation on the larger prime number we can use ? And maybe we can set the seed value a little bit larger, use larger prime value instead of 1 ?
        Hide
        Xuan Gong added a comment -

        So, those java files use small primes,31, and seed value as 1 to generate the hash value:
        ./hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationAttemptId.java
        ./hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationId.java
        ./hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ContainerId.java
        ./hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/NodeId.java
        ./hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/Priority.java
        ./hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/Resource.java
        ./hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ResourceRequest.java

        Show
        Xuan Gong added a comment - So, those java files use small primes,31, and seed value as 1 to generate the hash value: ./hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationAttemptId.java ./hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationId.java ./hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ContainerId.java ./hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/NodeId.java ./hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/Priority.java ./hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/Resource.java ./hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ResourceRequest.java
        Hide
        Radim Kolar added a comment -

        You also use initial (seed) value 1 or -1. Best is to use different prime per class.

        Show
        Radim Kolar added a comment - You also use initial (seed) value 1 or -1. Best is to use different prime per class.

          People

          • Assignee:
            Xuan Gong
            Reporter:
            Arun C Murthy
          • Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development