Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-2408

Make Gridmix emulate usage of data compression

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.23.0
    • Component/s: contrib/gridmix
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Hide
      Emulates the MapReduce compression feature in Gridmix. By default, compression emulation is turned on. Compression emulation can be disabled by setting 'gridmix.compression-emulation.enable' to 'false'. Use 'gridmix.compression-emulation.map-input.decompression-ratio', 'gridmix.compression-emulation.map-output.compression-ratio' and 'gridmix.compression-emulation.reduce-output.compression-ratio' to configure the compression ratios at map input, map output and reduce output side respectively. Currently, compression ratios in the range [0.07, 0.68] are supported. Gridmix auto detects whether map-input, map output and reduce output should emulate compression based on original job's compression related configuration parameters.
      Show
      Emulates the MapReduce compression feature in Gridmix. By default, compression emulation is turned on. Compression emulation can be disabled by setting 'gridmix.compression-emulation.enable' to 'false'. Use 'gridmix.compression-emulation.map-input.decompression-ratio', 'gridmix.compression-emulation.map-output.compression-ratio' and 'gridmix.compression-emulation.reduce-output.compression-ratio' to configure the compression ratios at map input, map output and reduce output side respectively. Currently, compression ratios in the range [0.07, 0.68] are supported. Gridmix auto detects whether map-input, map output and reduce output should emulate compression based on original job's compression related configuration parameters.

      Description

      Currently Gridmix emulates disk IO load only. This JIRA is to make Gridmix emulate load due to data compression as defined by the job-trace.

        Issue Links

          Activity

          Hide
          Amar Kamat added a comment -

          The goal of this jira is to emulate the compression characteristics of a MapReduce job. Emulating compression characteristics involves the following 1. Generating compressible data. The compression characteristics (e.g compression ratio) of the data (map input, map output and reduce output) should be configurable. 2. Extract compression related properties from original job's configuration and history files. Configure the simulated job to mimic the compression behavior using the original job's configuration and history.

          Show
          Amar Kamat added a comment - The goal of this jira is to emulate the compression characteristics of a MapReduce job. Emulating compression characteristics involves the following 1. Generating compressible data. The compression characteristics (e.g compression ratio) of the data (map input, map output and reduce output) should be configurable. 2. Extract compression related properties from original job's configuration and history files. Configure the simulated job to mimic the compression behavior using the original job's configuration and history.
          Hide
          Amar Kamat added a comment -

          Attaching a patch implementing compression emulation support in Gridmix. test-patch and ant tests passed. Manually tested the patch.

          Show
          Amar Kamat added a comment - Attaching a patch implementing compression emulation support in Gridmix. test-patch and ant tests passed. Manually tested the patch.
          Hide
          Amar Kamat added a comment -

          Running through Hudson.

          Show
          Amar Kamat added a comment - Running through Hudson.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12480563/MR-2408-gridmix-compression-emulation-v1.1.patch
          against trunk revision 1127444.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 6 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these core unit tests:
          org.apache.hadoop.cli.TestMRCLI
          org.apache.hadoop.tools.TestHadoopArchives
          org.apache.hadoop.tools.TestHarFileSystem

          -1 contrib tests. The patch failed contrib unit tests.

          +1 system test framework. The patch passed system test framework compile.

          Test results: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/308//testReport/
          Findbugs warnings: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/308//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/308//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12480563/MR-2408-gridmix-compression-emulation-v1.1.patch against trunk revision 1127444. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: org.apache.hadoop.cli.TestMRCLI org.apache.hadoop.tools.TestHadoopArchives org.apache.hadoop.tools.TestHarFileSystem -1 contrib tests. The patch failed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/308//testReport/ Findbugs warnings: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/308//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/308//console This message is automatically generated.
          Hide
          Ravi Gummadi added a comment -

          Tests failed are not related to this patch. Findbugs warnings reported by Hudson are also not related to this patch.

          Patch looks good to me. +1

          Show
          Ravi Gummadi added a comment - Tests failed are not related to this patch. Findbugs warnings reported by Hudson are also not related to this patch. Patch looks good to me. +1
          Hide
          Amar Kamat added a comment -

          I just committed this to trunk. Thanks Ravi for the review!

          Show
          Amar Kamat added a comment - I just committed this to trunk. Thanks Ravi for the review!
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk-Commit #703 (See https://builds.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/703/)
          MAPREDUCE-2408. [Gridmix] Compression emulation in Gridmix. (amarrk)

          amarrk : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1128162
          Files :

          • /hadoop/mapreduce/trunk/CHANGES.txt
          • /hadoop/mapreduce/trunk/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/DistributedCacheEmulator.java
          • /hadoop/mapreduce/trunk/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/GridmixRecord.java
          • /hadoop/mapreduce/trunk/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/CompressionEmulationUtil.java
          • /hadoop/mapreduce/trunk/src/docs/src/documentation/content/xdocs/gridmix.xml
          • /hadoop/mapreduce/trunk/src/contrib/gridmix/src/test/org/apache/hadoop/mapred/gridmix/TestCompressionEmulationUtils.java
          • /hadoop/mapreduce/trunk/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/AvgRecordFactory.java
          • /hadoop/mapreduce/trunk/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/GridmixJob.java
          • /hadoop/mapreduce/trunk/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/SleepJob.java
          • /hadoop/mapreduce/trunk/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/RandomTextDataGenerator.java
          • /hadoop/mapreduce/trunk/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/LoadJob.java
          • /hadoop/mapreduce/trunk/src/contrib/gridmix/src/test/org/apache/hadoop/mapred/gridmix/TestRandomTextDataGenerator.java
          • /hadoop/mapreduce/trunk/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/Gridmix.java
          • /hadoop/mapreduce/trunk/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/FileQueue.java
          • /hadoop/mapreduce/trunk/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/GenerateData.java
          • /hadoop/mapreduce/trunk/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/GenerateDistCacheData.java
          • /hadoop/mapreduce/trunk/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/InputStriper.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #703 (See https://builds.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/703/ ) MAPREDUCE-2408 . [Gridmix] Compression emulation in Gridmix. (amarrk) amarrk : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1128162 Files : /hadoop/mapreduce/trunk/CHANGES.txt /hadoop/mapreduce/trunk/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/DistributedCacheEmulator.java /hadoop/mapreduce/trunk/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/GridmixRecord.java /hadoop/mapreduce/trunk/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/CompressionEmulationUtil.java /hadoop/mapreduce/trunk/src/docs/src/documentation/content/xdocs/gridmix.xml /hadoop/mapreduce/trunk/src/contrib/gridmix/src/test/org/apache/hadoop/mapred/gridmix/TestCompressionEmulationUtils.java /hadoop/mapreduce/trunk/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/AvgRecordFactory.java /hadoop/mapreduce/trunk/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/GridmixJob.java /hadoop/mapreduce/trunk/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/SleepJob.java /hadoop/mapreduce/trunk/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/RandomTextDataGenerator.java /hadoop/mapreduce/trunk/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/LoadJob.java /hadoop/mapreduce/trunk/src/contrib/gridmix/src/test/org/apache/hadoop/mapred/gridmix/TestRandomTextDataGenerator.java /hadoop/mapreduce/trunk/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/Gridmix.java /hadoop/mapreduce/trunk/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/FileQueue.java /hadoop/mapreduce/trunk/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/GenerateData.java /hadoop/mapreduce/trunk/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/GenerateDistCacheData.java /hadoop/mapreduce/trunk/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/InputStriper.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk #692 (See https://builds.apache.org/hudson/job/Hadoop-Mapreduce-trunk/692/)
          MAPREDUCE-2408. [Gridmix] Compression emulation in Gridmix. (amarrk)

          amarrk : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1128162
          Files :

          • /hadoop/mapreduce/trunk/CHANGES.txt
          • /hadoop/mapreduce/trunk/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/DistributedCacheEmulator.java
          • /hadoop/mapreduce/trunk/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/GridmixRecord.java
          • /hadoop/mapreduce/trunk/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/CompressionEmulationUtil.java
          • /hadoop/mapreduce/trunk/src/docs/src/documentation/content/xdocs/gridmix.xml
          • /hadoop/mapreduce/trunk/src/contrib/gridmix/src/test/org/apache/hadoop/mapred/gridmix/TestCompressionEmulationUtils.java
          • /hadoop/mapreduce/trunk/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/AvgRecordFactory.java
          • /hadoop/mapreduce/trunk/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/GridmixJob.java
          • /hadoop/mapreduce/trunk/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/SleepJob.java
          • /hadoop/mapreduce/trunk/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/RandomTextDataGenerator.java
          • /hadoop/mapreduce/trunk/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/LoadJob.java
          • /hadoop/mapreduce/trunk/src/contrib/gridmix/src/test/org/apache/hadoop/mapred/gridmix/TestRandomTextDataGenerator.java
          • /hadoop/mapreduce/trunk/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/Gridmix.java
          • /hadoop/mapreduce/trunk/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/FileQueue.java
          • /hadoop/mapreduce/trunk/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/GenerateData.java
          • /hadoop/mapreduce/trunk/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/GenerateDistCacheData.java
          • /hadoop/mapreduce/trunk/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/InputStriper.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #692 (See https://builds.apache.org/hudson/job/Hadoop-Mapreduce-trunk/692/ ) MAPREDUCE-2408 . [Gridmix] Compression emulation in Gridmix. (amarrk) amarrk : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1128162 Files : /hadoop/mapreduce/trunk/CHANGES.txt /hadoop/mapreduce/trunk/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/DistributedCacheEmulator.java /hadoop/mapreduce/trunk/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/GridmixRecord.java /hadoop/mapreduce/trunk/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/CompressionEmulationUtil.java /hadoop/mapreduce/trunk/src/docs/src/documentation/content/xdocs/gridmix.xml /hadoop/mapreduce/trunk/src/contrib/gridmix/src/test/org/apache/hadoop/mapred/gridmix/TestCompressionEmulationUtils.java /hadoop/mapreduce/trunk/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/AvgRecordFactory.java /hadoop/mapreduce/trunk/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/GridmixJob.java /hadoop/mapreduce/trunk/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/SleepJob.java /hadoop/mapreduce/trunk/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/RandomTextDataGenerator.java /hadoop/mapreduce/trunk/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/LoadJob.java /hadoop/mapreduce/trunk/src/contrib/gridmix/src/test/org/apache/hadoop/mapred/gridmix/TestRandomTextDataGenerator.java /hadoop/mapreduce/trunk/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/Gridmix.java /hadoop/mapreduce/trunk/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/FileQueue.java /hadoop/mapreduce/trunk/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/GenerateData.java /hadoop/mapreduce/trunk/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/GenerateDistCacheData.java /hadoop/mapreduce/trunk/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/InputStriper.java
          Hide
          Hong Tang added a comment -

          Looks like I missed it before it gets committed. I quickly went through the patch. I like the approach of using a dictionary and empirically match the compression ratio with the dictionary size. However, I believe the compression ratio would be different under different compression codecs (even same codec under different levels). It'd be useful if you could extend CompressionRatioLookupTable so that it takes as input a compression codec (and you may only support the most common few codecs lzo, gzip, and bzip2).

          Show
          Hong Tang added a comment - Looks like I missed it before it gets committed. I quickly went through the patch. I like the approach of using a dictionary and empirically match the compression ratio with the dictionary size. However, I believe the compression ratio would be different under different compression codecs (even same codec under different levels). It'd be useful if you could extend CompressionRatioLookupTable so that it takes as input a compression codec (and you may only support the most common few codecs lzo, gzip, and bzip2).
          Hide
          Amar Kamat added a comment -

          Hong,
          Thanks a lot for your review. You are right. The compression ratios table will be different for different codecs. The empirical values table in this patch is computed for the default codec (i.e Gzip). We have compiled similar table for LZO and it seems LZO too shows some pattern in that respect. The plan is to add other codecs incrementally. I will open a JIRA to track LZO compression emulation.

          Show
          Amar Kamat added a comment - Hong, Thanks a lot for your review. You are right. The compression ratios table will be different for different codecs. The empirical values table in this patch is computed for the default codec (i.e Gzip). We have compiled similar table for LZO and it seems LZO too shows some pattern in that respect. The plan is to add other codecs incrementally. I will open a JIRA to track LZO compression emulation.
          Hide
          Amar Kamat added a comment -

          Opened MAPREDUCE-2542 for tracking LZO codec support in Gridmix.

          Show
          Amar Kamat added a comment - Opened MAPREDUCE-2542 for tracking LZO codec support in Gridmix.

            People

            • Assignee:
              Amar Kamat
              Reporter:
              Ravi Gummadi
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development