Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 2.5.0
    • Fix Version/s: None
    • Component/s: performance, task
    • Labels:
      None
    • Target Version/s:

      Description

      Currently, the IFile format used by the MR shuffle checksums all data using the zlib CRC32 polynomial. If we allow use of CRC32C instead, we can get a large reduction in CPU usage by leveraging the native hardware CRC32C implementation (approx half a second of CPU time savings per GB checksummed).

      1. mapreduce-5962.txt
        14 kB
        Todd Lipcon
      2. mapreduce-5962.txt
        14 kB
        Todd Lipcon

        Issue Links

          Activity

          Hide
          Todd Lipcon added a comment -

          Realized for this to be effective we also need to implement the Checksum interface with the native code. Currently the native code only supports the "chunked sums" verification used by HDFS, and doesn't implement the java Checksum.update interface that IFile uses. Will hold off on this patch for the time being.

          Show
          Todd Lipcon added a comment - Realized for this to be effective we also need to implement the Checksum interface with the native code. Currently the native code only supports the "chunked sums" verification used by HDFS, and doesn't implement the java Checksum.update interface that IFile uses. Will hold off on this patch for the time being.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12656376/mapreduce-5962.txt
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          -1 findbugs. The patch appears to introduce 1 new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4752//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4752//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-core.html
          Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4752//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12656376/mapreduce-5962.txt against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 3 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 1 new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4752//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4752//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-core.html Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4752//console This message is automatically generated.
          Hide
          Todd Lipcon added a comment -

          The RawKVIteratorReader used in the shuffle wasn't properly passing the jobconf into the IFileReader. This was causing an NPE when we tried to get the checksum type out of the conf. I changed it to pass the jobConf, which may actually have a slight performance advantage too due to avoiding the "new Configuration()" call in IFileInputStream's ctor. Verified that the two unit tests that failed before now pass on my machine

          Show
          Todd Lipcon added a comment - The RawKVIteratorReader used in the shuffle wasn't properly passing the jobconf into the IFileReader. This was causing an NPE when we tried to get the checksum type out of the conf. I changed it to pass the jobConf, which may actually have a slight performance advantage too due to avoiding the "new Configuration()" call in IFileInputStream's ctor. Verified that the two unit tests that failed before now pass on my machine
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12656314/mapreduce-5962.txt
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          -1 findbugs. The patch appears to introduce 2 new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient:

          org.apache.hadoop.mapred.TestReduceFetch
          org.apache.hadoop.mapred.TestReduceFetchFromPartialMem

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4750//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4750//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-core.html
          Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4750//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12656314/mapreduce-5962.txt against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 3 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 2 new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient: org.apache.hadoop.mapred.TestReduceFetch org.apache.hadoop.mapred.TestReduceFetchFromPartialMem +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4750//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4750//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-core.html Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4750//console This message is automatically generated.
          Hide
          Todd Lipcon added a comment -

          (fwiw this depends on James Thomas's work to enable native checksumming on byte arrays. So we won't see an immediate benefit, but will once that patch is done)

          Show
          Todd Lipcon added a comment - (fwiw this depends on James Thomas 's work to enable native checksumming on byte arrays. So we won't see an immediate benefit, but will once that patch is done)
          Hide
          Todd Lipcon added a comment -

          Attached patch adds a new configuration to set the IFile checksum type. I changed the default to CRC32C since it's much faster if you have the native libraries available.

          I don't believe this is an incompatible change, since IFiles are only used internal to a single job (written by map, read by reduce). So, one would never have a different version reader compared to writer. That said, if anyone has any issues with this, they can configure the default back to CRC32 cluster-wide.

          Show
          Todd Lipcon added a comment - Attached patch adds a new configuration to set the IFile checksum type. I changed the default to CRC32C since it's much faster if you have the native libraries available. I don't believe this is an incompatible change, since IFiles are only used internal to a single job (written by map, read by reduce). So, one would never have a different version reader compared to writer. That said, if anyone has any issues with this, they can configure the default back to CRC32 cluster-wide.

            People

            • Assignee:
              Todd Lipcon
              Reporter:
              Todd Lipcon
            • Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:

                Development