Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-2786

TestDFSIO should also test compression reading/writing from command-line.

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 2.0.0-alpha
    • Fix Version/s: 2.0.2-alpha, 0.23.4
    • Component/s: benchmarks
    • Labels:
    • Hadoop Flags:
      Reviewed
    • Tags:
      testdfsio

      Description

      I thought it might be beneficial to simply alter the code of TestDFSIO to accept any compression codec class and allow testing for compression by a command line argument instead of having to change the config file everytime. Something like "-compression" would do.

      1. MAPREDUCE_2786.patch
        5 kB
        Plamen Jeliazkov
      2. MAPREDUCE_2786.patch
        5 kB
        Plamen Jeliazkov
      3. MAPREDUCE_2786.patch
        10 kB
        Plamen Jeliazkov
      4. MAPREDUCE-2786.patch
        7 kB
        Plamen Jeliazkov

        Issue Links

          Activity

          Hide
          Plamen Jeliazkov added a comment -

          This is my work done so far. I'd like to move the codec into the mapper constructors but I have not been able to do it successfully because the CompressionOutputStream relies on the OutputStream within each mapper.

          Show
          Plamen Jeliazkov added a comment - This is my work done so far. I'd like to move the codec into the mapper constructors but I have not been able to do it successfully because the CompressionOutputStream relies on the OutputStream within each mapper.
          Hide
          Plamen Jeliazkov added a comment -

          Here is a patch for review.

          Show
          Plamen Jeliazkov added a comment - Here is a patch for review.
          Hide
          Konstantin Shvachko added a comment -

          That is a good thing to have an opportunity to benchmark with compression.
          Couple suggestions.

          1. Move all compression configuration logic, including reflections and the cc variable all the way to IOMapperBase.configure(). Otherwise all this small actions will be counted as execution time.
          2. You should not work separately with compressed and non-compressed streams inside doIO(). Same out or in variables should just point to compressed or not compressed streams. Nesting streams is a regular practice.
          3. getCompression() is not used anywhere, should be removed.
          4. You use test.compression to get the codec class and test.io.compression.class to set it. How is going to work? You should make two constants with the property and the default value and use them.
          5. AppendMapper is not covered. It should be the same as others. Moving the conig logic into IOMapperBase should make it easy.
          Show
          Konstantin Shvachko added a comment - That is a good thing to have an opportunity to benchmark with compression. Couple suggestions. Move all compression configuration logic, including reflections and the cc variable all the way to IOMapperBase.configure() . Otherwise all this small actions will be counted as execution time. You should not work separately with compressed and non-compressed streams inside doIO(). Same out or in variables should just point to compressed or not compressed streams. Nesting streams is a regular practice. getCompression() is not used anywhere, should be removed. You use test.compression to get the codec class and test.io.compression.class to set it. How is going to work? You should make two constants with the property and the default value and use them. AppendMapper is not covered. It should be the same as others. Moving the conig logic into IOMapperBase should make it easy.
          Hide
          Plamen Jeliazkov added a comment -

          Took all of Konstantin's advices. New patch for review. Let me know how it looks.

          Show
          Plamen Jeliazkov added a comment - Took all of Konstantin's advices. New patch for review. Let me know how it looks.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12542191/MAPREDUCE_2786.patch
          against trunk revision .

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2781//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12542191/MAPREDUCE_2786.patch against trunk revision . -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2781//console This message is automatically generated.
          Hide
          Plamen Jeliazkov added a comment -

          Ah. I need to set up my IDE better; I seem to always have the wrong index when creating patches.
          I'll post up a patch with the correct index tomorrow.

          Show
          Plamen Jeliazkov added a comment - Ah. I need to set up my IDE better; I seem to always have the wrong index when creating patches. I'll post up a patch with the correct index tomorrow.
          Hide
          Konstantin Shvachko added a comment -

          A few review comments. (Unlike Jenkins I could apply your patch)

          • createControlFile() uses new compressionClass parameter only to Log it.
            You should log compressionClass in run() along with other input parameters, like bufferSize or baseDir. That should be enough.
          • Don't print it in analyzeResult() either. I think it is not the output parameter but the input.
            Even if it was necessary, it should be taken from conf.
          • White space changes inside loop in doIO() for WriteMapper and ReadMapper. Please avoid.

          Otherwise it's good and good to go.

          Show
          Konstantin Shvachko added a comment - A few review comments. (Unlike Jenkins I could apply your patch) createControlFile() uses new compressionClass parameter only to Log it. You should log compressionClass in run() along with other input parameters, like bufferSize or baseDir. That should be enough. Don't print it in analyzeResult() either. I think it is not the output parameter but the input. Even if it was necessary, it should be taken from conf. White space changes inside loop in doIO() for WriteMapper and ReadMapper. Please avoid. Otherwise it's good and good to go.
          Hide
          Plamen Jeliazkov added a comment -

          This patch should address Konstantin's guidelines and some of the nits with the whitespaces.

          Show
          Plamen Jeliazkov added a comment - This patch should address Konstantin's guidelines and some of the nits with the whitespaces.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12543316/MAPREDUCE_2786.patch
          against trunk revision .

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2800//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12543316/MAPREDUCE_2786.patch against trunk revision . -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2800//console This message is automatically generated.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12543318/MAPREDUCE_2786.patch
          against trunk revision .

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2801//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12543318/MAPREDUCE_2786.patch against trunk revision . -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2801//console This message is automatically generated.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12543320/MAPREDUCE_2786.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified test files.

          -1 javac. The patch appears to cause the build to fail.

          Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2802//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12543320/MAPREDUCE_2786.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified test files. -1 javac. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2802//console This message is automatically generated.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12543323/MAPREDUCE_2786.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2808//testReport/
          Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2808//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12543323/MAPREDUCE_2786.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2808//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2808//console This message is automatically generated.
          Hide
          Konstantin Shvachko added a comment -

          +1 from me too

          Show
          Konstantin Shvachko added a comment - +1 from me too
          Hide
          Konstantin Shvachko added a comment -

          I just committed this to branch-2 and trunk.
          Thank you Plamen.

          Show
          Konstantin Shvachko added a comment - I just committed this to branch-2 and trunk. Thank you Plamen.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Common-trunk-Commit #2674 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2674/)
          MAPREDUCE-2786. Add compression option for TestDFSIO. Contributed by Plamen Jeliazkov. (Revision 1380310)

          Result = SUCCESS
          shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1380310
          Files :

          • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/fs/IOMapperBase.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/fs/TestDFSIO.java
          Show
          Hudson added a comment - Integrated in Hadoop-Common-trunk-Commit #2674 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2674/ ) MAPREDUCE-2786 . Add compression option for TestDFSIO. Contributed by Plamen Jeliazkov. (Revision 1380310) Result = SUCCESS shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1380310 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/fs/IOMapperBase.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/fs/TestDFSIO.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk-Commit #2737 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2737/)
          MAPREDUCE-2786. Add compression option for TestDFSIO. Contributed by Plamen Jeliazkov. (Revision 1380310)

          Result = SUCCESS
          shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1380310
          Files :

          • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/fs/IOMapperBase.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/fs/TestDFSIO.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #2737 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2737/ ) MAPREDUCE-2786 . Add compression option for TestDFSIO. Contributed by Plamen Jeliazkov. (Revision 1380310) Result = SUCCESS shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1380310 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/fs/IOMapperBase.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/fs/TestDFSIO.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk-Commit #2699 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2699/)
          MAPREDUCE-2786. Add compression option for TestDFSIO. Contributed by Plamen Jeliazkov. (Revision 1380310)

          Result = FAILURE
          shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1380310
          Files :

          • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/fs/IOMapperBase.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/fs/TestDFSIO.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #2699 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2699/ ) MAPREDUCE-2786 . Add compression option for TestDFSIO. Contributed by Plamen Jeliazkov. (Revision 1380310) Result = FAILURE shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1380310 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/fs/IOMapperBase.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/fs/TestDFSIO.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk #1155 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1155/)
          MAPREDUCE-2786. Add compression option for TestDFSIO. Contributed by Plamen Jeliazkov. (Revision 1380310)

          Result = SUCCESS
          shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1380310
          Files :

          • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/fs/IOMapperBase.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/fs/TestDFSIO.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #1155 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1155/ ) MAPREDUCE-2786 . Add compression option for TestDFSIO. Contributed by Plamen Jeliazkov. (Revision 1380310) Result = SUCCESS shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1380310 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/fs/IOMapperBase.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/fs/TestDFSIO.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk #1186 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1186/)
          MAPREDUCE-2786. Add compression option for TestDFSIO. Contributed by Plamen Jeliazkov. (Revision 1380310)

          Result = SUCCESS
          shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1380310
          Files :

          • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/fs/IOMapperBase.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/fs/TestDFSIO.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #1186 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1186/ ) MAPREDUCE-2786 . Add compression option for TestDFSIO. Contributed by Plamen Jeliazkov. (Revision 1380310) Result = SUCCESS shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1380310 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/fs/IOMapperBase.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/fs/TestDFSIO.java
          Hide
          Konstantin Shvachko added a comment -

          Committed this to branch-0.23 to avoid discrepancies with MAPREDUCE-4651.

          Show
          Konstantin Shvachko added a comment - Committed this to branch-0.23 to avoid discrepancies with MAPREDUCE-4651 .
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-0.23-Build #386 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/386/)
          MAPREDUCE-2786. Add compression option for TestDFSIO. Contributed by Plamen Jeliazkov. (Revision 1390150)

          Result = UNSTABLE
          shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1390150
          Files :

          • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/fs/IOMapperBase.java
          • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/fs/TestDFSIO.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-0.23-Build #386 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/386/ ) MAPREDUCE-2786 . Add compression option for TestDFSIO. Contributed by Plamen Jeliazkov. (Revision 1390150) Result = UNSTABLE shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1390150 Files : /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/fs/IOMapperBase.java /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/fs/TestDFSIO.java

            People

            • Assignee:
              Plamen Jeliazkov
              Reporter:
              Plamen Jeliazkov
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - 36h
                36h
                Remaining:
                Remaining Estimate - 36h
                36h
                Logged:
                Time Spent - Not Specified
                Not Specified

                  Development