Hadoop Common
  1. Hadoop Common
  2. HADOOP-5727

Faster, simpler id.hashCode() which does not allocate memory

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.21.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Integer.valueOf allocates memory if the integer is not in the object-cache, which is the vast majority of cases for the task id. It is possible to compute the hash code of an integer without going via the integer cache, and hence avoiding allocating memory.

      1. 00_id-noallocate.patch
        0.3 kB
        Shevek
      2. 03_id-noallocate.patch
        0.5 kB
        Shevek
      3. 5727-0.patch
        0.4 kB
        Chris Douglas

        Issue Links

          Activity

          Hide
          Hudson added a comment -

          Integrated in Hadoop-trunk #827 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/827/)
          . Simplify hashcode for ID types. Contributed by Shevek

          Show
          Hudson added a comment - Integrated in Hadoop-trunk #827 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/827/ ) . Simplify hashcode for ID types. Contributed by Shevek
          Hide
          Chris Douglas added a comment -

          I committed this. Thanks, Shevek

          Show
          Chris Douglas added a comment - I committed this. Thanks, Shevek
          Hide
          Chris Douglas added a comment -

          Removed comments

          Show
          Chris Douglas added a comment - Removed comments
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12406649/03_id-noallocate.patch
          against trunk revision 770044.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no tests are needed for this patch.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/264/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/264/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/264/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/264/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12406649/03_id-noallocate.patch against trunk revision 770044. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/264/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/264/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/264/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/264/console This message is automatically generated.
          Hide
          Chris Douglas added a comment -

          As pointed out, the patch returns exactly what Integer::hashCode did, so the fix only removes the object alloc (which is probably also optimized out, but returning id is clearer). The supplemental hashing in HashMap is interesting; purely out of curiosity, do you happen to have any references for broader, J2SE changes?

          HashMap is a list-hash, so consecutive hashes followed by a hash collision does not cause a walk of a long linear hash chain.

          Are you sure? The openjdk6 source appears to create/walk a list for collisions (addEntry/get)...

          The current patch looks good, but the comments keep context for this particular fix. Would you mind removing them?

          Show
          Chris Douglas added a comment - As pointed out, the patch returns exactly what Integer::hashCode did, so the fix only removes the object alloc (which is probably also optimized out, but returning id is clearer). The supplemental hashing in HashMap is interesting; purely out of curiosity, do you happen to have any references for broader, J2SE changes? HashMap is a list-hash, so consecutive hashes followed by a hash collision does not cause a walk of a long linear hash chain. Are you sure? The openjdk6 source appears to create/walk a list for collisions (addEntry/get)... The current patch looks good, but the comments keep context for this particular fix. Would you mind removing them?
          Hide
          Shevek added a comment -

          This is a good point. It is not necessary to create an object. However, returning the id directly might not lead to a good hash code. We may have to add a hash code implementation.

          This discussion was rampant in the very early days of Java: Consecutive objects, including String, etc, have always had consecutive hash codes. In fact, it does not really matter because:

          • HashMap is a list-hash, so consecutive hashes followed by a hash collision does not cause a walk of a long linear hash chain. The length of the chain walked because of a collision will only be 2.
          • HashMap implements a supplementary transformation on hash codes (one of the Mersenne Twisters?), so it is not necessary to ensure distribution or uniformity of the basic hash codes. In fact, HashMap probably does better than the application code would do.
          • Other Map strategies, such as R-B tree, do not use hashCode().
          • In general, the issue of consecutive objects, especially numbers, generating consecutive hash codes is accepted and understood by library authors who require more uniform distribution of hash codes, and account is taken at that point.
          • The J2SE internally uses this strategy, and they spend a lot more time thinking about these problems than we do.

          I therefore submit that this patch should be applied as-is.

          Show
          Shevek added a comment - This is a good point. It is not necessary to create an object. However, returning the id directly might not lead to a good hash code. We may have to add a hash code implementation. This discussion was rampant in the very early days of Java: Consecutive objects, including String, etc, have always had consecutive hash codes. In fact, it does not really matter because: HashMap is a list-hash, so consecutive hashes followed by a hash collision does not cause a walk of a long linear hash chain. The length of the chain walked because of a collision will only be 2. HashMap implements a supplementary transformation on hash codes (one of the Mersenne Twisters?), so it is not necessary to ensure distribution or uniformity of the basic hash codes. In fact, HashMap probably does better than the application code would do. Other Map strategies, such as R-B tree, do not use hashCode(). In general, the issue of consecutive objects, especially numbers, generating consecutive hash codes is accepted and understood by library authors who require more uniform distribution of hash codes, and account is taken at that point. The J2SE internally uses this strategy, and they spend a lot more time thinking about these problems than we do. I therefore submit that this patch should be applied as-is.
          Hide
          Shevek added a comment -

          Another attempt at the patch, running svn diff from a different root.

          Show
          Shevek added a comment - Another attempt at the patch, running svn diff from a different root.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12406213/00_id-noallocate.patch
          against trunk revision 768376.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no tests are needed for this patch.

          -1 patch. The patch command could not apply the patch.

          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/240/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12406213/00_id-noallocate.patch against trunk revision 768376. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. -1 patch. The patch command could not apply the patch. Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/240/console This message is automatically generated.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          This is a good point. It is not necessary to create an object. However, returning the id directly might not lead to a good hash code. We may have to add a hash code implementation.

          A related issue: HADOOP-3383

          Show
          Tsz Wo Nicholas Sze added a comment - This is a good point. It is not necessary to create an object. However, returning the id directly might not lead to a good hash code. We may have to add a hash code implementation. A related issue: HADOOP-3383
          Hide
          Shevek added a comment -

          aha, I found the patch upload button - it's not the one marked "submit patch"!

          Show
          Shevek added a comment - aha, I found the patch upload button - it's not the one marked "submit patch"!

            People

            • Assignee:
              Shevek
              Reporter:
              Shevek
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development