Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-5069

add concrete common implementations of CombineFileInputFormat

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 2.0.3-alpha
    • Fix Version/s: 2.1.0-beta
    • Component/s: mrv1, mrv2
    • Labels:
      None

      Description

      CombineFileInputFormat is abstract, and its specific equivalents to TextInputFormat, SequenceFileInputFormat, etc. are currently not in the hadoop code base.

      These sound like very common need wherever CombineFileInputFormat is used, and different folks would write the same code over and over to achieve the same goal. It sounds very natural for hadoop to provide at least the text and sequence file implementations of the CombineFileInputFormat class.

      1. MAPREDUCE-5069.patch
        58 kB
        Sangjin Lee
      2. MAPREDUCE-5069-1.patch
        57 kB
        Sangjin Lee
      3. MAPREDUCE-5069-2.patch
        59 kB
        Sangjin Lee
      4. MAPREDUCE-5069-3.patch
        53 kB
        Sangjin Lee
      5. MAPREDUCE-5069-4.patch
        53 kB
        Sangjin Lee
      6. MAPREDUCE-5069-5.patch
        54 kB
        Sangjin Lee
      7. MAPREDUCE-5069-6.patch
        54 kB
        Sangjin Lee

        Activity

        Hide
        Sangjin Lee added a comment -

        I'd be happy to work out a patch and provide it if folks think it's a good idea. Thanks!

        Show
        Sangjin Lee added a comment - I'd be happy to work out a patch and provide it if folks think it's a good idea. Thanks!
        Hide
        Sandy Ryza added a comment -

        +1 to the idea, Sangjin

        Show
        Sandy Ryza added a comment - +1 to the idea, Sangjin
        Hide
        Chris Nauroth added a comment -

        +1 on the idea from me too. This is something that I would find useful.

        Show
        Chris Nauroth added a comment - +1 on the idea from me too. This is something that I would find useful.
        Hide
        Robert Joseph Evans added a comment -

        +1 on the idea too.

        Show
        Robert Joseph Evans added a comment - +1 on the idea too.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12574275/MAPREDUCE-5069.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 4 new or modified test files.

        -1 one of tests included doesn't have a timeout.

        -1 javac. The patch appears to cause the build to fail.

        Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3430//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12574275/MAPREDUCE-5069.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 4 new or modified test files. -1 one of tests included doesn't have a timeout. -1 javac . The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3430//console This message is automatically generated.
        Hide
        Sangjin Lee added a comment -

        Updated the patch (casting was breaking the compilation). Got bitten by the eclipse JDT compiler.

        Show
        Sangjin Lee added a comment - Updated the patch (casting was breaking the compilation). Got bitten by the eclipse JDT compiler.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12574281/MAPREDUCE-5069-1.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 4 new or modified test files.

        -1 one of tests included doesn't have a timeout.

        -1 javac. The applied patch generated 1366 javac compiler warnings (more than the trunk's current 1362 warnings).

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient:

        org.apache.hadoop.mapreduce.v2.TestMiniMRProxyUser
        org.apache.hadoop.mapreduce.lib.db.TestDataDrivenDBInputFormat
        org.apache.hadoop.mapred.TestFieldSelection
        org.apache.hadoop.mapreduce.TestLocalRunner
        org.apache.hadoop.mapred.TestUserDefinedCounters
        org.apache.hadoop.mapreduce.v2.TestSpeculativeExecution
        org.apache.hadoop.mapreduce.TestMROutputFormat
        org.apache.hadoop.mapred.TestLineRecordReader
        org.apache.hadoop.mapred.TestCombineTextInputFormat
        org.apache.hadoop.mapreduce.lib.fieldsel.TestMRFieldSelection
        org.apache.hadoop.mapreduce.lib.map.TestMultithreadedMapper
        org.apache.hadoop.mapred.lib.TestChainMapReduce
        org.apache.hadoop.mapreduce.security.TestBinaryTokenFile
        org.apache.hadoop.mapreduce.TestMapReduce
        org.apache.hadoop.mapred.TestLazyOutput
        org.apache.hadoop.mapreduce.lib.join.TestJoinDatamerge
        org.apache.hadoop.mapred.lib.TestKeyFieldBasedComparator
        org.apache.hadoop.mapred.lib.TestMultithreadedMapRunner
        org.apache.hadoop.mapreduce.v2.TestMRJobsWithHistoryService
        org.apache.hadoop.mapreduce.TestChild
        org.apache.hadoop.mapred.TestReduceFetch
        org.apache.hadoop.mapreduce.lib.input.TestCombineTextInputFormat
        org.apache.hadoop.mapred.lib.TestMultipleOutputs
        org.apache.hadoop.mapred.TestJavaSerialization
        org.apache.hadoop.mapreduce.lib.input.TestLineRecordReader
        org.apache.hadoop.mapreduce.security.ssl.TestEncryptedShuffle
        org.apache.hadoop.mapreduce.lib.output.TestMRMultipleOutputs
        org.apache.hadoop.mapred.TestClusterMapReduceTestCase
        org.apache.hadoop.mapred.TestCollect
        org.apache.hadoop.mapred.join.TestDatamerge
        org.apache.hadoop.mapreduce.TestMapCollection
        org.apache.hadoop.mapred.TestMiniMRClientCluster
        org.apache.hadoop.mapred.TestMapRed
        org.apache.hadoop.mapred.TestFileOutputFormat
        org.apache.hadoop.mapreduce.TestValueIterReset
        org.apache.hadoop.mapreduce.v2.TestMROldApiJobs
        org.apache.hadoop.mapred.TestJobCounters
        org.apache.hadoop.mapreduce.v2.TestUberAM
        org.apache.hadoop.conf.TestNoDefaultsJobConf
        org.apache.hadoop.mapred.TestReporter
        org.apache.hadoop.mapreduce.lib.partition.TestMRKeyFieldBasedComparator
        org.apache.hadoop.mapreduce.lib.chain.TestChainErrors
        org.apache.hadoop.mapreduce.lib.chain.TestSingleElementChain
        org.apache.hadoop.mapreduce.lib.input.TestMultipleInputs
        org.apache.hadoop.mapreduce.v2.TestMRJobs
        org.apache.hadoop.mapred.TestComparators
        org.apache.hadoop.mapreduce.lib.chain.TestMapReduceChain
        org.apache.hadoop.mapred.jobcontrol.TestLocalJobControl
        org.apache.hadoop.mapreduce.TestMapReduceLazyOutput
        org.apache.hadoop.mapreduce.lib.jobcontrol.TestMapReduceJobControl

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3431//testReport/
        Javac warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3431//artifact/trunk/patchprocess/diffJavacWarnings.txt
        Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3431//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12574281/MAPREDUCE-5069-1.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 4 new or modified test files. -1 one of tests included doesn't have a timeout. -1 javac . The applied patch generated 1366 javac compiler warnings (more than the trunk's current 1362 warnings). +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient: org.apache.hadoop.mapreduce.v2.TestMiniMRProxyUser org.apache.hadoop.mapreduce.lib.db.TestDataDrivenDBInputFormat org.apache.hadoop.mapred.TestFieldSelection org.apache.hadoop.mapreduce.TestLocalRunner org.apache.hadoop.mapred.TestUserDefinedCounters org.apache.hadoop.mapreduce.v2.TestSpeculativeExecution org.apache.hadoop.mapreduce.TestMROutputFormat org.apache.hadoop.mapred.TestLineRecordReader org.apache.hadoop.mapred.TestCombineTextInputFormat org.apache.hadoop.mapreduce.lib.fieldsel.TestMRFieldSelection org.apache.hadoop.mapreduce.lib.map.TestMultithreadedMapper org.apache.hadoop.mapred.lib.TestChainMapReduce org.apache.hadoop.mapreduce.security.TestBinaryTokenFile org.apache.hadoop.mapreduce.TestMapReduce org.apache.hadoop.mapred.TestLazyOutput org.apache.hadoop.mapreduce.lib.join.TestJoinDatamerge org.apache.hadoop.mapred.lib.TestKeyFieldBasedComparator org.apache.hadoop.mapred.lib.TestMultithreadedMapRunner org.apache.hadoop.mapreduce.v2.TestMRJobsWithHistoryService org.apache.hadoop.mapreduce.TestChild org.apache.hadoop.mapred.TestReduceFetch org.apache.hadoop.mapreduce.lib.input.TestCombineTextInputFormat org.apache.hadoop.mapred.lib.TestMultipleOutputs org.apache.hadoop.mapred.TestJavaSerialization org.apache.hadoop.mapreduce.lib.input.TestLineRecordReader org.apache.hadoop.mapreduce.security.ssl.TestEncryptedShuffle org.apache.hadoop.mapreduce.lib.output.TestMRMultipleOutputs org.apache.hadoop.mapred.TestClusterMapReduceTestCase org.apache.hadoop.mapred.TestCollect org.apache.hadoop.mapred.join.TestDatamerge org.apache.hadoop.mapreduce.TestMapCollection org.apache.hadoop.mapred.TestMiniMRClientCluster org.apache.hadoop.mapred.TestMapRed org.apache.hadoop.mapred.TestFileOutputFormat org.apache.hadoop.mapreduce.TestValueIterReset org.apache.hadoop.mapreduce.v2.TestMROldApiJobs org.apache.hadoop.mapred.TestJobCounters org.apache.hadoop.mapreduce.v2.TestUberAM org.apache.hadoop.conf.TestNoDefaultsJobConf org.apache.hadoop.mapred.TestReporter org.apache.hadoop.mapreduce.lib.partition.TestMRKeyFieldBasedComparator org.apache.hadoop.mapreduce.lib.chain.TestChainErrors org.apache.hadoop.mapreduce.lib.chain.TestSingleElementChain org.apache.hadoop.mapreduce.lib.input.TestMultipleInputs org.apache.hadoop.mapreduce.v2.TestMRJobs org.apache.hadoop.mapred.TestComparators org.apache.hadoop.mapreduce.lib.chain.TestMapReduceChain org.apache.hadoop.mapred.jobcontrol.TestLocalJobControl org.apache.hadoop.mapreduce.TestMapReduceLazyOutput org.apache.hadoop.mapreduce.lib.jobcontrol.TestMapReduceJobControl +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3431//testReport/ Javac warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3431//artifact/trunk/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3431//console This message is automatically generated.
        Hide
        Sangjin Lee added a comment -

        Fixed the test that was failing (due to the order of files returned).

        Show
        Sangjin Lee added a comment - Fixed the test that was failing (due to the order of files returned).
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12574303/MAPREDUCE-5069-2.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 4 new or modified test files.

        -1 one of tests included doesn't have a timeout.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient:

        org.apache.hadoop.mapreduce.v2.TestMiniMRProxyUser
        org.apache.hadoop.mapreduce.lib.db.TestDataDrivenDBInputFormat
        org.apache.hadoop.mapred.TestFieldSelection
        org.apache.hadoop.mapreduce.TestLocalRunner
        org.apache.hadoop.mapred.TestUserDefinedCounters
        org.apache.hadoop.mapreduce.v2.TestSpeculativeExecution
        org.apache.hadoop.mapreduce.TestMROutputFormat
        org.apache.hadoop.mapred.TestLineRecordReader
        org.apache.hadoop.mapreduce.lib.fieldsel.TestMRFieldSelection
        org.apache.hadoop.mapreduce.lib.map.TestMultithreadedMapper
        org.apache.hadoop.mapred.lib.TestChainMapReduce
        org.apache.hadoop.mapreduce.security.TestBinaryTokenFile
        org.apache.hadoop.mapreduce.TestMapReduce
        org.apache.hadoop.mapred.TestLazyOutput
        org.apache.hadoop.mapreduce.lib.join.TestJoinDatamerge
        org.apache.hadoop.mapred.lib.TestKeyFieldBasedComparator
        org.apache.hadoop.mapred.lib.TestMultithreadedMapRunner
        org.apache.hadoop.mapreduce.v2.TestMRJobsWithHistoryService
        org.apache.hadoop.mapreduce.TestChild
        org.apache.hadoop.mapred.TestReduceFetch
        org.apache.hadoop.mapred.lib.TestMultipleOutputs
        org.apache.hadoop.mapred.TestJavaSerialization
        org.apache.hadoop.mapreduce.lib.input.TestLineRecordReader
        org.apache.hadoop.mapreduce.security.ssl.TestEncryptedShuffle
        org.apache.hadoop.mapreduce.lib.output.TestMRMultipleOutputs
        org.apache.hadoop.mapred.TestClusterMapReduceTestCase
        org.apache.hadoop.mapred.TestCollect
        org.apache.hadoop.mapred.join.TestDatamerge
        org.apache.hadoop.mapreduce.TestMapCollection
        org.apache.hadoop.mapred.TestMiniMRClientCluster
        org.apache.hadoop.mapred.TestMapRed
        org.apache.hadoop.mapred.TestFileOutputFormat
        org.apache.hadoop.mapreduce.TestValueIterReset
        org.apache.hadoop.mapreduce.v2.TestMROldApiJobs
        org.apache.hadoop.mapred.TestJobCounters
        org.apache.hadoop.mapreduce.v2.TestUberAM
        org.apache.hadoop.conf.TestNoDefaultsJobConf
        org.apache.hadoop.mapred.TestReporter
        org.apache.hadoop.mapreduce.lib.partition.TestMRKeyFieldBasedComparator
        org.apache.hadoop.mapreduce.lib.chain.TestChainErrors
        org.apache.hadoop.mapreduce.lib.chain.TestSingleElementChain
        org.apache.hadoop.mapreduce.lib.input.TestMultipleInputs
        org.apache.hadoop.mapreduce.v2.TestMRJobs
        org.apache.hadoop.mapred.TestComparators
        org.apache.hadoop.mapreduce.lib.chain.TestMapReduceChain
        org.apache.hadoop.mapred.jobcontrol.TestLocalJobControl
        org.apache.hadoop.mapreduce.TestMapReduceLazyOutput
        org.apache.hadoop.mapreduce.lib.jobcontrol.TestMapReduceJobControl

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3432//testReport/
        Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3432//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12574303/MAPREDUCE-5069-2.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 4 new or modified test files. -1 one of tests included doesn't have a timeout. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient: org.apache.hadoop.mapreduce.v2.TestMiniMRProxyUser org.apache.hadoop.mapreduce.lib.db.TestDataDrivenDBInputFormat org.apache.hadoop.mapred.TestFieldSelection org.apache.hadoop.mapreduce.TestLocalRunner org.apache.hadoop.mapred.TestUserDefinedCounters org.apache.hadoop.mapreduce.v2.TestSpeculativeExecution org.apache.hadoop.mapreduce.TestMROutputFormat org.apache.hadoop.mapred.TestLineRecordReader org.apache.hadoop.mapreduce.lib.fieldsel.TestMRFieldSelection org.apache.hadoop.mapreduce.lib.map.TestMultithreadedMapper org.apache.hadoop.mapred.lib.TestChainMapReduce org.apache.hadoop.mapreduce.security.TestBinaryTokenFile org.apache.hadoop.mapreduce.TestMapReduce org.apache.hadoop.mapred.TestLazyOutput org.apache.hadoop.mapreduce.lib.join.TestJoinDatamerge org.apache.hadoop.mapred.lib.TestKeyFieldBasedComparator org.apache.hadoop.mapred.lib.TestMultithreadedMapRunner org.apache.hadoop.mapreduce.v2.TestMRJobsWithHistoryService org.apache.hadoop.mapreduce.TestChild org.apache.hadoop.mapred.TestReduceFetch org.apache.hadoop.mapred.lib.TestMultipleOutputs org.apache.hadoop.mapred.TestJavaSerialization org.apache.hadoop.mapreduce.lib.input.TestLineRecordReader org.apache.hadoop.mapreduce.security.ssl.TestEncryptedShuffle org.apache.hadoop.mapreduce.lib.output.TestMRMultipleOutputs org.apache.hadoop.mapred.TestClusterMapReduceTestCase org.apache.hadoop.mapred.TestCollect org.apache.hadoop.mapred.join.TestDatamerge org.apache.hadoop.mapreduce.TestMapCollection org.apache.hadoop.mapred.TestMiniMRClientCluster org.apache.hadoop.mapred.TestMapRed org.apache.hadoop.mapred.TestFileOutputFormat org.apache.hadoop.mapreduce.TestValueIterReset org.apache.hadoop.mapreduce.v2.TestMROldApiJobs org.apache.hadoop.mapred.TestJobCounters org.apache.hadoop.mapreduce.v2.TestUberAM org.apache.hadoop.conf.TestNoDefaultsJobConf org.apache.hadoop.mapred.TestReporter org.apache.hadoop.mapreduce.lib.partition.TestMRKeyFieldBasedComparator org.apache.hadoop.mapreduce.lib.chain.TestChainErrors org.apache.hadoop.mapreduce.lib.chain.TestSingleElementChain org.apache.hadoop.mapreduce.lib.input.TestMultipleInputs org.apache.hadoop.mapreduce.v2.TestMRJobs org.apache.hadoop.mapred.TestComparators org.apache.hadoop.mapreduce.lib.chain.TestMapReduceChain org.apache.hadoop.mapred.jobcontrol.TestLocalJobControl org.apache.hadoop.mapreduce.TestMapReduceLazyOutput org.apache.hadoop.mapreduce.lib.jobcontrol.TestMapReduceJobControl +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3432//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3432//console This message is automatically generated.
        Hide
        Sangjin Lee added a comment -

        Hmm... The build is clean, and all the tests that I have added are green. However, it seems that somehow more unit tests are run than before, many of which are failing. Prior to the patch 210 unit tests run. After the patch, 464 unit tests run. I believe all the failing tests are contained in that difference.

        https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3431/testReport/
        https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3432/testReport/

        I'm unsure why this patch causes more tests to run? Is hadoop-mapreduce-client-jobclient not the right place to put the unit tests for combine file input format tests?

        Show
        Sangjin Lee added a comment - Hmm... The build is clean, and all the tests that I have added are green. However, it seems that somehow more unit tests are run than before, many of which are failing. Prior to the patch 210 unit tests run. After the patch, 464 unit tests run. I believe all the failing tests are contained in that difference. https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3431/testReport/ https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3432/testReport/ I'm unsure why this patch causes more tests to run? Is hadoop-mapreduce-client-jobclient not the right place to put the unit tests for combine file input format tests?
        Hide
        Chris Nauroth added a comment -

        Hi, Sangjin. I suspect the test failures are unrelated to your patch. I've seen multiple new test failures that I believe were introduced by the MAPREDUCE-5028 patch. I've asked about it on that jira.

        Show
        Chris Nauroth added a comment - Hi, Sangjin. I suspect the test failures are unrelated to your patch. I've seen multiple new test failures that I believe were introduced by the MAPREDUCE-5028 patch. I've asked about it on that jira.
        Hide
        Sangjin Lee added a comment -

        Oh OK. Thanks. I'd appreciate you guys' review of the patch and feedback.

        Show
        Sangjin Lee added a comment - Oh OK. Thanks. I'd appreciate you guys' review of the patch and feedback.
        Hide
        Sandy Ryza added a comment -

        Hi Sangjin,

        Overall the patch looks great. A couple questions:

        • Why is a RecordReaderWrapper needed? Why not just just return the relevant RecordReader directly?
        • Assuming there's a good reason for the above, can SequenceFileRecordReaderWrapper and TextFileRecordReaderWrapper go as private inner classes into their corresponding CombineFileInputFormats? Is there a reason they need to be public?
        Show
        Sandy Ryza added a comment - Hi Sangjin, Overall the patch looks great. A couple questions: Why is a RecordReaderWrapper needed? Why not just just return the relevant RecordReader directly? Assuming there's a good reason for the above, can SequenceFileRecordReaderWrapper and TextFileRecordReaderWrapper go as private inner classes into their corresponding CombineFileInputFormats? Is there a reason they need to be public?
        Hide
        Sangjin Lee added a comment -

        Those are great questions. There is some method to the madness. They have to do with the way CombineFileRecordReader works. First of all, it requires a specific constructor from the record reader:

        • old: (CombineFileSplit, Configuration, Reporter, Integer)
        • new: (CombineFileSplit, TaskAttemptContext, Integer)

        So the record reader that gets passed into the CombineFileRecordReader needs to deal with the CombineFileSplit. That's why the relevant record readers cannot be used directly.

        The second question also has something to do with the constructor requirement. I suppose they could be made private static classes (but not a non-static inner class, due to the constructor requirement). Initially I created them as public in case someone may want to subclass them. But I acknowledge that the probability is pretty remote, considering these are real thin.

        Let me tinker with it a little bit and fold them into the input format classes.

        Show
        Sangjin Lee added a comment - Those are great questions. There is some method to the madness. They have to do with the way CombineFileRecordReader works. First of all, it requires a specific constructor from the record reader: old: (CombineFileSplit, Configuration, Reporter, Integer) new: (CombineFileSplit, TaskAttemptContext, Integer) So the record reader that gets passed into the CombineFileRecordReader needs to deal with the CombineFileSplit. That's why the relevant record readers cannot be used directly. The second question also has something to do with the constructor requirement. I suppose they could be made private static classes (but not a non-static inner class, due to the constructor requirement). Initially I created them as public in case someone may want to subclass them. But I acknowledge that the probability is pretty remote, considering these are real thin. Let me tinker with it a little bit and fold them into the input format classes.
        Hide
        Sangjin Lee added a comment -

        Specific record reader wrapper classes are now made private, internal to the corresponding input format classes.

        Show
        Sangjin Lee added a comment - Specific record reader wrapper classes are now made private, internal to the corresponding input format classes.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12574448/MAPREDUCE-5069-3.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 4 new or modified test files.

        -1 one of tests included doesn't have a timeout.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3436//testReport/
        Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3436//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12574448/MAPREDUCE-5069-3.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 4 new or modified test files. -1 one of tests included doesn't have a timeout. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3436//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3436//console This message is automatically generated.
        Hide
        Sangjin Lee added a comment -

        Hopefully one last time, to silence the warning about the timeout.

        Show
        Sangjin Lee added a comment - Hopefully one last time, to silence the warning about the timeout.
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12574468/MAPREDUCE-5069-4.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 4 new or modified test files.

        +1 tests included appear to have a timeout.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3438//testReport/
        Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3438//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12574468/MAPREDUCE-5069-4.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 4 new or modified test files. +1 tests included appear to have a timeout. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3438//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3438//console This message is automatically generated.
        Hide
        Sangjin Lee added a comment -

        Could you get your reviews and +/-1's?

        Show
        Sangjin Lee added a comment - Could you get your reviews and +/-1's?
        Hide
        Sangjin Lee added a comment -

        Ping? Should I assign it to myself? Would I be able to commit it myself after getting your reviews? I am an apache committer. Thanks.

        Show
        Sangjin Lee added a comment - Ping? Should I assign it to myself? Would I be able to commit it myself after getting your reviews? I am an apache committer. Thanks.
        Hide
        Sandy Ryza added a comment -

        Sangjin,

        I think you're more likely to get a response from a committer by shooting an email to mapreduce-dev list.

        Another nit about the patch: it might make sense to rename RecordReaderWrapper to something like CombineFileRecordReaderWrapper so that it's clear it's something specific to CombineFileInputFormat.

        Show
        Sandy Ryza added a comment - Sangjin, I think you're more likely to get a response from a committer by shooting an email to mapreduce-dev list. Another nit about the patch: it might make sense to rename RecordReaderWrapper to something like CombineFileRecordReaderWrapper so that it's clear it's something specific to CombineFileInputFormat.
        Hide
        Sangjin Lee added a comment -

        Thanks for the suggestions Sandy. I'm going to update the patch. I'll also hit up the mapreduce dev list.

        Show
        Sangjin Lee added a comment - Thanks for the suggestions Sandy. I'm going to update the patch. I'll also hit up the mapreduce dev list.
        Hide
        Sangjin Lee added a comment -

        Renamed RecordReaderWrapper, and added annotations.

        Show
        Sangjin Lee added a comment - Renamed RecordReaderWrapper, and added annotations.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12575922/MAPREDUCE-5069-5.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 4 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        -1 eclipse:eclipse. The patch failed to build with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3478//testReport/
        Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3478//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12575922/MAPREDUCE-5069-5.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 4 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. -1 eclipse:eclipse . The patch failed to build with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3478//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3478//console This message is automatically generated.
        Hide
        Robert Joseph Evans added a comment -

        I took a quick look at this and the code seems fine. I want to dig into it in more detail to be sure I understand exactly what is happening.

        Show
        Robert Joseph Evans added a comment - I took a quick look at this and the code seems fine. I want to dig into it in more detail to be sure I understand exactly what is happening.
        Hide
        Sangjin Lee added a comment -

        Thanks Robert.

        Show
        Sangjin Lee added a comment - Thanks Robert.
        Hide
        Robert Joseph Evans added a comment -

        I have a few minor comments. Nothing major and nothing that I am very tied to so if you want to push back on them I would be fine checking the code in as is.

        I understand why you wrote CombineFileRecordReaderWrapper instead of extending CombineFileRecordReader, but I am not really sure the name is that clear. CombineFileRecordReaderWrapper is not a wrapper for a CombineFileRecordReader, but is a wrapper for a FileRecordReader that can be used by CombineFileRecordReader. I am not really sure of a clearer name for it though so I think it is OK.

        I think I would prefer to have checkFileSplit be turned into asserts, or possibly remove it entirely. If we have gotten that far we should be controlled by the CombineFileRecordReader, and we are just validating that it has set the values that it should have set already and does currently set. I just don't see many situations out side of testing where we would want to check this. Which is why I would change them to asserts so that for tests we can validate that everything is correct, but there is no overhead when we are not testing.

        I am also not too happy with having Random used in tests. Random makes it very hard to reproduce an error, I have had too many of them fail sporadically, and have to dig through the logs to find the seed, then hard code the seed... Granted that means that Random found a bug that hard codeing it would not, and I realize that, so I'll leave it up to you. I would just rather have the test with a hard coded seed, or not be random at all. But that is just me.

        Show
        Robert Joseph Evans added a comment - I have a few minor comments. Nothing major and nothing that I am very tied to so if you want to push back on them I would be fine checking the code in as is. I understand why you wrote CombineFileRecordReaderWrapper instead of extending CombineFileRecordReader, but I am not really sure the name is that clear. CombineFileRecordReaderWrapper is not a wrapper for a CombineFileRecordReader, but is a wrapper for a FileRecordReader that can be used by CombineFileRecordReader. I am not really sure of a clearer name for it though so I think it is OK. I think I would prefer to have checkFileSplit be turned into asserts, or possibly remove it entirely. If we have gotten that far we should be controlled by the CombineFileRecordReader, and we are just validating that it has set the values that it should have set already and does currently set. I just don't see many situations out side of testing where we would want to check this. Which is why I would change them to asserts so that for tests we can validate that everything is correct, but there is no overhead when we are not testing. I am also not too happy with having Random used in tests. Random makes it very hard to reproduce an error, I have had too many of them fail sporadically, and have to dig through the logs to find the seed, then hard code the seed... Granted that means that Random found a bug that hard codeing it would not, and I realize that, so I'll leave it up to you. I would just rather have the test with a hard coded seed, or not be random at all. But that is just me.
        Hide
        Sangjin Lee added a comment -

        Thanks Robert for detailed feedback! I greatly appreciate it.

        We went back and forth on the name. Initially I named it RecordReaderWrapper, but it wasn't clear that it should be used in the context of CombineFileRecordReader, so CombineFileRecordReaderWrapper was suggested. As you said, I'm also not sure what might be a better name without being too verbose. I'm not married to the current name, but I'm slightly inclined to leaving it as is.

        As for the second point, I agree that the chance of this condition being violated is remote. I'm going to turn this into an assert.

        I also hear you about the random seeds. The only reason I used them that way is frankly I followed the precedent of other similar tests. I hope it's acceptable...

        I'll submit a new version of the patch soon. Thanks again!

        Show
        Sangjin Lee added a comment - Thanks Robert for detailed feedback! I greatly appreciate it. We went back and forth on the name. Initially I named it RecordReaderWrapper, but it wasn't clear that it should be used in the context of CombineFileRecordReader, so CombineFileRecordReaderWrapper was suggested. As you said, I'm also not sure what might be a better name without being too verbose. I'm not married to the current name, but I'm slightly inclined to leaving it as is. As for the second point, I agree that the chance of this condition being violated is remote. I'm going to turn this into an assert. I also hear you about the random seeds. The only reason I used them that way is frankly I followed the precedent of other similar tests. I hope it's acceptable... I'll submit a new version of the patch soon. Thanks again!
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12580209/MAPREDUCE-5069-6.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 4 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3543//testReport/
        Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3543//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12580209/MAPREDUCE-5069-6.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 4 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3543//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3543//console This message is automatically generated.
        Hide
        Robert Joseph Evans added a comment -

        The tests pass and the changes look fine to me. +1

        Show
        Robert Joseph Evans added a comment - The tests pass and the changes look fine to me. +1
        Hide
        Robert Joseph Evans added a comment -

        Thanks Sangjin,

        I put this into trunk and branch-2. Thanks for asking for the review too, it is good work and I am glad we could get it in.

        Show
        Robert Joseph Evans added a comment - Thanks Sangjin, I put this into trunk and branch-2. Thanks for asking for the review too, it is good work and I am glad we could get it in.
        Hide
        Hudson added a comment -

        Integrated in Hadoop-trunk-Commit #3654 (See https://builds.apache.org/job/Hadoop-trunk-Commit/3654/)
        MAPREDUCE-5069. add concrete common implementations of CombineFileInputFormat (Sangjin Lee via bobby) (Revision 1471424)

        Result = SUCCESS
        bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1471424
        Files :

        • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/lib/CombineFileRecordReaderWrapper.java
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/lib/CombineSequenceFileInputFormat.java
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/lib/CombineTextInputFormat.java
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/CombineFileRecordReaderWrapper.java
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/CombineSequenceFileInputFormat.java
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/CombineTextInputFormat.java
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestCombineSequenceFileInputFormat.java
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestCombineTextInputFormat.java
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestCombineSequenceFileInputFormat.java
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestCombineTextInputFormat.java
        Show
        Hudson added a comment - Integrated in Hadoop-trunk-Commit #3654 (See https://builds.apache.org/job/Hadoop-trunk-Commit/3654/ ) MAPREDUCE-5069 . add concrete common implementations of CombineFileInputFormat (Sangjin Lee via bobby) (Revision 1471424) Result = SUCCESS bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1471424 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/lib/CombineFileRecordReaderWrapper.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/lib/CombineSequenceFileInputFormat.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/lib/CombineTextInputFormat.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/CombineFileRecordReaderWrapper.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/CombineSequenceFileInputFormat.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/CombineTextInputFormat.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestCombineSequenceFileInputFormat.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestCombineTextInputFormat.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestCombineSequenceFileInputFormat.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestCombineTextInputFormat.java
        Hide
        Sangjin Lee added a comment -

        Thanks! Much appreciated.

        Show
        Sangjin Lee added a comment - Thanks! Much appreciated.
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Yarn-trunk #194 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/194/)
        MAPREDUCE-5069. add concrete common implementations of CombineFileInputFormat (Sangjin Lee via bobby) (Revision 1471424)

        Result = SUCCESS
        bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1471424
        Files :

        • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/lib/CombineFileRecordReaderWrapper.java
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/lib/CombineSequenceFileInputFormat.java
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/lib/CombineTextInputFormat.java
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/CombineFileRecordReaderWrapper.java
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/CombineSequenceFileInputFormat.java
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/CombineTextInputFormat.java
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestCombineSequenceFileInputFormat.java
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestCombineTextInputFormat.java
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestCombineSequenceFileInputFormat.java
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestCombineTextInputFormat.java
        Show
        Hudson added a comment - Integrated in Hadoop-Yarn-trunk #194 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/194/ ) MAPREDUCE-5069 . add concrete common implementations of CombineFileInputFormat (Sangjin Lee via bobby) (Revision 1471424) Result = SUCCESS bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1471424 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/lib/CombineFileRecordReaderWrapper.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/lib/CombineSequenceFileInputFormat.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/lib/CombineTextInputFormat.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/CombineFileRecordReaderWrapper.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/CombineSequenceFileInputFormat.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/CombineTextInputFormat.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestCombineSequenceFileInputFormat.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestCombineTextInputFormat.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestCombineSequenceFileInputFormat.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestCombineTextInputFormat.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk #1383 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1383/)
        MAPREDUCE-5069. add concrete common implementations of CombineFileInputFormat (Sangjin Lee via bobby) (Revision 1471424)

        Result = FAILURE
        bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1471424
        Files :

        • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/lib/CombineFileRecordReaderWrapper.java
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/lib/CombineSequenceFileInputFormat.java
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/lib/CombineTextInputFormat.java
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/CombineFileRecordReaderWrapper.java
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/CombineSequenceFileInputFormat.java
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/CombineTextInputFormat.java
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestCombineSequenceFileInputFormat.java
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestCombineTextInputFormat.java
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestCombineSequenceFileInputFormat.java
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestCombineTextInputFormat.java
        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #1383 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1383/ ) MAPREDUCE-5069 . add concrete common implementations of CombineFileInputFormat (Sangjin Lee via bobby) (Revision 1471424) Result = FAILURE bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1471424 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/lib/CombineFileRecordReaderWrapper.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/lib/CombineSequenceFileInputFormat.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/lib/CombineTextInputFormat.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/CombineFileRecordReaderWrapper.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/CombineSequenceFileInputFormat.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/CombineTextInputFormat.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestCombineSequenceFileInputFormat.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestCombineTextInputFormat.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestCombineSequenceFileInputFormat.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestCombineTextInputFormat.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk #1410 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1410/)
        MAPREDUCE-5069. add concrete common implementations of CombineFileInputFormat (Sangjin Lee via bobby) (Revision 1471424)

        Result = SUCCESS
        bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1471424
        Files :

        • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/lib/CombineFileRecordReaderWrapper.java
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/lib/CombineSequenceFileInputFormat.java
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/lib/CombineTextInputFormat.java
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/CombineFileRecordReaderWrapper.java
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/CombineSequenceFileInputFormat.java
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/CombineTextInputFormat.java
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestCombineSequenceFileInputFormat.java
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestCombineTextInputFormat.java
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestCombineSequenceFileInputFormat.java
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestCombineTextInputFormat.java
        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #1410 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1410/ ) MAPREDUCE-5069 . add concrete common implementations of CombineFileInputFormat (Sangjin Lee via bobby) (Revision 1471424) Result = SUCCESS bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1471424 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/lib/CombineFileRecordReaderWrapper.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/lib/CombineSequenceFileInputFormat.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/lib/CombineTextInputFormat.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/CombineFileRecordReaderWrapper.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/CombineSequenceFileInputFormat.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/CombineTextInputFormat.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestCombineSequenceFileInputFormat.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestCombineTextInputFormat.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestCombineSequenceFileInputFormat.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestCombineTextInputFormat.java
        Hide
        Derek Farren added a comment -

        It doesn't work on sequence files because the CombineFileRecordReaderWrapper constructor deals only with first argument type FileInputFormat. It needs another constructor that can deal with SequenceFileInputFormat.

        This is what we have:
        CombineFileRecordReaderWrapper(FileInputFormat<K,V> inputFormat, CombineFileSplit split, TaskAttemptContext context, Integer idx)

        This is what we need to add:
        CombineFileRecordReaderWrapper(SequenceFileInputFormat<K, V> sequenceFileInputFormat, CombineFileSplit split, TaskAttemptContext context, Integer idx)

        Show
        Derek Farren added a comment - It doesn't work on sequence files because the CombineFileRecordReaderWrapper constructor deals only with first argument type FileInputFormat. It needs another constructor that can deal with SequenceFileInputFormat. This is what we have: CombineFileRecordReaderWrapper(FileInputFormat<K,V> inputFormat, CombineFileSplit split, TaskAttemptContext context, Integer idx) This is what we need to add: CombineFileRecordReaderWrapper(SequenceFileInputFormat<K, V> sequenceFileInputFormat, CombineFileSplit split, TaskAttemptContext context, Integer idx)
        Hide
        Jason Lowe added a comment -

        SequenceFileInputFormat extends FileInputFormat, and CombineSequenceFileInputFormat.SequenceFileRecordReaderWrapper is an example where the original CombineFileRecordReaderWrapper constructor is being called with a SequenceFileInputFormat instance.

        Could you elaborate on why you believe this doesn't work on sequence files or provide an example?

        Show
        Jason Lowe added a comment - SequenceFileInputFormat extends FileInputFormat, and CombineSequenceFileInputFormat.SequenceFileRecordReaderWrapper is an example where the original CombineFileRecordReaderWrapper constructor is being called with a SequenceFileInputFormat instance. Could you elaborate on why you believe this doesn't work on sequence files or provide an example?
        Hide
        Derek Farren added a comment -

        You're right. My apologies.

        Show
        Derek Farren added a comment - You're right. My apologies.

          People

          • Assignee:
            Sangjin Lee
            Reporter:
            Sangjin Lee
          • Votes:
            0 Vote for this issue
            Watchers:
            11 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development