Details

    • Type: New Feature New Feature
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: applicationmaster, mrv2
    • Labels:
      None
    • Target Version/s:

      Description

      Instead of computing the input splits as part of job submission, Hadoop could have a separate "job task type" that computes the input splits, therefore allowing that computation to happen on the cluster.

      1. MAPREDUCE-207.v07.patch
        15 kB
        Gera Shegalov
      2. MAPREDUCE-207.v06.patch
        14 kB
        Gera Shegalov
      3. MAPREDUCE-207.v05.patch
        10 kB
        Gera Shegalov
      4. MAPREDUCE-207.v03.patch
        11 kB
        Gera Shegalov
      5. MAPREDUCE-207.v02.patch
        11 kB
        Gera Shegalov
      6. MAPREDUCE-207.patch
        11 kB
        Arun C Murthy

        Issue Links

          Activity

          Hide
          Philip Zeyliger added a comment -

          The motivation behind computing the input splits on the cluster is at least two-fold:

          • It would be great to be able to submit jobs to a cluster using a simple (REST?) API, from many languages. (Similar to HADOOP-5633.) The fact that job submission does a bunch of mapreduce-internal work makes such submission very tricky. We're already seeing how workflow systems (here I'm thinking of Oozie and Pig) run MR jobs simply to launch more MR jobs, while inheriting the scheduling and isolation work that the JobTracker already does.
          • Sometimes computing the input splits is, in of itself, an operation that would do well to be run in parallel across several machines. For example, splitting inputs may require going through many files on the DFS. Moving input split calculations onto the cluster would pave the way for this to be possible.

          Implementation-wise, we already have JOB_SETUP and JOB_CLEANUP tasks, so adding a JOB_SPLIT_CALCULATION, which could be colocated with JOB_SETUP makes some sense.

          Show
          Philip Zeyliger added a comment - The motivation behind computing the input splits on the cluster is at least two-fold: It would be great to be able to submit jobs to a cluster using a simple (REST?) API, from many languages. (Similar to HADOOP-5633 .) The fact that job submission does a bunch of mapreduce-internal work makes such submission very tricky. We're already seeing how workflow systems (here I'm thinking of Oozie and Pig) run MR jobs simply to launch more MR jobs, while inheriting the scheduling and isolation work that the JobTracker already does. Sometimes computing the input splits is, in of itself, an operation that would do well to be run in parallel across several machines. For example, splitting inputs may require going through many files on the DFS. Moving input split calculations onto the cluster would pave the way for this to be possible. Implementation-wise, we already have JOB_SETUP and JOB_CLEANUP tasks, so adding a JOB_SPLIT_CALCULATION, which could be colocated with JOB_SETUP makes some sense.
          Hide
          Hemanth Yamijala added a comment -

          Before we do this, I think we should resolve HADOOP-4421. Atleast to the extent of agreeing to a design. Adding one more task, while we are trying to fix problems with the existing ones might make things a tad more difficult to manage.

          Show
          Hemanth Yamijala added a comment - Before we do this, I think we should resolve HADOOP-4421 . Atleast to the extent of agreeing to a design. Adding one more task, while we are trying to fix problems with the existing ones might make things a tad more difficult to manage.
          Hide
          Devaraj Das added a comment -

          Isn't it possible to do this as part of the JOB_SETUP task itself?

          Show
          Devaraj Das added a comment - Isn't it possible to do this as part of the JOB_SETUP task itself?
          Hide
          Amareshwari Sriramadasu added a comment -

          Isn't it possible to do this as part of the JOB_SETUP task itself?

          This can be done. We should move out the creation of setup/cleanup tasks from JobInProgress.initTasks().

          Show
          Amareshwari Sriramadasu added a comment - Isn't it possible to do this as part of the JOB_SETUP task itself? This can be done. We should move out the creation of setup/cleanup tasks from JobInProgress.initTasks().
          Hide
          Amareshwari Sriramadasu added a comment -

          This can be done. We should move out the creation of setup/cleanup tasks from JobInProgress.initTasks().

          Related jira HADOOP-4472.

          Show
          Amareshwari Sriramadasu added a comment - This can be done. We should move out the creation of setup/cleanup tasks from JobInProgress.initTasks(). Related jira HADOOP-4472 .
          Hide
          Owen O'Malley added a comment -

          This patch should reintroduce checkInputSplits into org.apache.hadoop.mapreduce.InputFormat. This method should be documented as optional. It will only be invoked if Java code is doing the submission to detect errors in the user's job configuration, such as missing or read-protected input directory, before the job is submitted to the cluster.

          Show
          Owen O'Malley added a comment - This patch should reintroduce checkInputSplits into org.apache.hadoop.mapreduce.InputFormat. This method should be documented as optional . It will only be invoked if Java code is doing the submission to detect errors in the user's job configuration, such as missing or read-protected input directory, before the job is submitted to the cluster.
          Hide
          Philip Zeyliger added a comment -

          I've been poking around here and am running into a fair amount of friction with how different task types are managed.

          As far as I can tell, there are several ways that different task types are distinguished:

          • There's a TaskType enum, which contains MAP, REDUCE, JOB_SETUP, JOB_CLEANUP, and TASK_CLEANUP. This is used quite a bit.
          • TaskInProgress has isMapTask(), isJobCleanupTask(), isJobSetupTask(). I believe that TIP can report both isMapTask() and isJobCleanupTask() on the same object and that reduces are implied by !isMapTask().
          • Task uses a hybrid approach. There's MapTask and ReduceTask (a class hierarchy), but there's also isMapTask(), isJobSetupTask(), isTaskCleanupTask(), and isJobCleanuptask().
          • Schedulers and TaskTrackers for the most part only deal with MAP and REDUCE tasks. Really, these are "slot types", since other types of tasks can be run in them. Schedulers are not aware of the "special tasks"---the JobTracker schedules them "manually" on its own.

          Does this sound about right?

          – Philip

          Show
          Philip Zeyliger added a comment - I've been poking around here and am running into a fair amount of friction with how different task types are managed. As far as I can tell, there are several ways that different task types are distinguished: There's a TaskType enum, which contains MAP, REDUCE, JOB_SETUP, JOB_CLEANUP, and TASK_CLEANUP. This is used quite a bit. TaskInProgress has isMapTask(), isJobCleanupTask(), isJobSetupTask(). I believe that TIP can report both isMapTask() and isJobCleanupTask() on the same object and that reduces are implied by !isMapTask(). Task uses a hybrid approach. There's MapTask and ReduceTask (a class hierarchy), but there's also isMapTask(), isJobSetupTask(), isTaskCleanupTask(), and isJobCleanuptask(). Schedulers and TaskTrackers for the most part only deal with MAP and REDUCE tasks. Really, these are "slot types", since other types of tasks can be run in them. Schedulers are not aware of the "special tasks"---the JobTracker schedules them "manually" on its own. Does this sound about right? – Philip
          Hide
          Matei Zaharia added a comment -

          I think that's almost right, Philip. It looks to me like TASK_CLEANUP tasks can be both maps and reduces. The JobTracker will launch them in a reduce slot if they are cleaning up after a reducer. Therefore, isMapTask() might return false when the task is a cleanup task. To check whether a given Task is a plain old map task or plain old reduce task, you can use Task.isMapOrReduce().

          This part of the code definitely leaves something to be desired. I believe Arun mentioned he'd look at it as part of JobTracker refactoring in the future.

          Show
          Matei Zaharia added a comment - I think that's almost right, Philip. It looks to me like TASK_CLEANUP tasks can be both maps and reduces. The JobTracker will launch them in a reduce slot if they are cleaning up after a reducer. Therefore, isMapTask() might return false when the task is a cleanup task. To check whether a given Task is a plain old map task or plain old reduce task, you can use Task.isMapOrReduce(). This part of the code definitely leaves something to be desired. I believe Arun mentioned he'd look at it as part of JobTracker refactoring in the future.
          Hide
          Arun C Murthy added a comment -

          This is fairly trivial in MRv2, I'll take a crack at this.

          Show
          Arun C Murthy added a comment - This is fairly trivial in MRv2, I'll take a crack at this.
          Hide
          Arun C Murthy added a comment -

          As foretold, here is a trivial, preliminary patch to move computation of input-splits inside the cluster - something we've craved for a very long time, as evinced by the interest in this jira and the number of times it comes up on user lists.

          This is huge, because it's a significant step towards various improvements such as HTTP-based job submission etc.

          Shameless plug for MRv2 - it took me 15 mins on a Sunday night to get this done... glory to MRv2! smile


          It needs a tad more work to get delegation tokens on the client side, but it's nearly there.

          Show
          Arun C Murthy added a comment - As foretold, here is a trivial, preliminary patch to move computation of input-splits inside the cluster - something we've craved for a very long time, as evinced by the interest in this jira and the number of times it comes up on user lists. This is huge, because it's a significant step towards various improvements such as HTTP-based job submission etc. Shameless plug for MRv2 - it took me 15 mins on a Sunday night to get this done... glory to MRv2! smile It needs a tad more work to get delegation tokens on the client side, but it's nearly there.
          Hide
          Johannes Zillmann added a comment -

          Currently in our hadoop applications we calculate the splits before we submit it to the client (then the client simply looks up the existing splits). We do that mainly to influence the reducer count base on the number of splits/map-tasks.
          In case hadoop does the splitting on the cluster (which makes sense), it would be nice to have a hook to influence configuration!
          Sometimes it also makes sense for us to decide on the map-reduce assembly after we know the splits (different join strategies for different data constellations).

          Just dumping some ideas here...

          Show
          Johannes Zillmann added a comment - Currently in our hadoop applications we calculate the splits before we submit it to the client (then the client simply looks up the existing splits). We do that mainly to influence the reducer count base on the number of splits/map-tasks. In case hadoop does the splitting on the cluster (which makes sense), it would be nice to have a hook to influence configuration! Sometimes it also makes sense for us to decide on the map-reduce assembly after we know the splits (different join strategies for different data constellations). Just dumping some ideas here...
          Hide
          Sandy Ryza added a comment -

          Arun, are you still planning on working on this? If not, do you mind if I pick it up?

          Show
          Sandy Ryza added a comment - Arun, are you still planning on working on this? If not, do you mind if I pick it up?
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12644428/MAPREDUCE-207.v02.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient:

          org.apache.hadoop.mapred.lib.aggregate.TestAggregates
          org.apache.hadoop.mapreduce.lib.db.TestDataDrivenDBInputFormat
          org.apache.hadoop.mapred.TestFieldSelection
          org.apache.hadoop.mapred.TestOldCombinerGrouping
          org.apache.hadoop.mapreduce.TestLocalRunner
          org.apache.hadoop.mapred.TestUserDefinedCounters
          org.apache.hadoop.mapreduce.TestMROutputFormat
          org.apache.hadoop.mapreduce.lib.fieldsel.TestMRFieldSelection
          org.apache.hadoop.mapred.TestLocalMRNotification
          org.apache.hadoop.mapred.TestLineRecordReaderJobs
          org.apache.hadoop.mapreduce.lib.map.TestMultithreadedMapper
          org.apache.hadoop.mapreduce.TestNewCombinerGrouping
          org.apache.hadoop.mapred.lib.TestChainMapReduce
          org.apache.hadoop.mapreduce.TestMapReduce
          org.apache.hadoop.mapreduce.lib.join.TestJoinDatamerge
          org.apache.hadoop.mapred.lib.TestKeyFieldBasedComparator
          org.apache.hadoop.mapred.lib.TestMultithreadedMapRunner
          org.apache.hadoop.mapreduce.TestMapperReducerCleanup
          org.apache.hadoop.mapred.lib.TestMultipleOutputs
          org.apache.hadoop.mapred.TestJavaSerialization
          org.apache.hadoop.mapreduce.lib.output.TestMRMultipleOutputs
          org.apache.hadoop.mapred.TestCollect
          org.apache.hadoop.mapred.join.TestDatamerge
          org.apache.hadoop.mapreduce.TestMapCollection
          org.apache.hadoop.mapreduce.lib.aggregate.TestMapReduceAggregates
          org.apache.hadoop.mapred.TestMapRed
          org.apache.hadoop.mapred.TestFileOutputFormat
          org.apache.hadoop.mapreduce.TestValueIterReset
          org.apache.hadoop.mapred.TestMapOutputType
          org.apache.hadoop.mapred.TestJobCounters
          org.apache.hadoop.conf.TestNoDefaultsJobConf
          org.apache.hadoop.mapred.TestReporter
          org.apache.hadoop.mapreduce.lib.partition.TestMRKeyFieldBasedComparator
          org.apache.hadoop.mapreduce.lib.chain.TestChainErrors
          org.apache.hadoop.mapreduce.lib.chain.TestSingleElementChain
          org.apache.hadoop.mapreduce.lib.input.TestMultipleInputs
          org.apache.hadoop.mapred.TestComparators
          org.apache.hadoop.mapreduce.lib.input.TestLineRecordReaderJobs
          org.apache.hadoop.mapreduce.lib.chain.TestMapReduceChain
          org.apache.hadoop.mapred.jobcontrol.TestLocalJobControl
          org.apache.hadoop.mapreduce.lib.jobcontrol.TestMapReduceJobControl

          The following test timeouts occurred in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient:

          org.apache.hadoop.mapreduce.v2.app.job.impl.TestTaskAttempt
          org.apache.hadoop.mapreduce.v2.app.job.impl.TestMapReduceChildJVM
          org.apache.hadoop.mapreduce.v2.app.webapp.TestAMWebApp
          org.apache.hadoop.mapreduce.v2.app.TestMRClientService
          org.apache.hadoop.mapreduce.v2.app.TestKill
          org.apache.hadoop.mapreduce.v2.app.TestMRApp
          org.apache.hadoop.mapreduce.v2.app.TestJobEndNotifier
          org.apache.hadoop.mapreduce.v2.app.TestFail
          org.apache.hadoop.mapreduce.v2.app.TestStagingCleanup
          org.apache.hadoop.mapreduce.v2.app.TestRMContainerAllocator
          org.apache.hadoop.mapreduce.v2.app.launcher.TestContainerLauncher
          org.apache.hadoop.mapreduce.v2.app.TestRecovery
          org.apache.hadoop.mapreduce.v2.app.TestAMInfos
          org.apache.hadoop.mapreduce.v2.app.TestMRAppComponentDependencies
          org.apache.hadoop.mapreduce.v2.app.TestFetchFailure
          org.apache.hadoop.mapreduce.v2.TestSpeculativeExecutionWithMRApp
          org.apache.hadoop.mapred.pipes.TestPipeApplication

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4595//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4595//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-app.html
          Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4595//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12644428/MAPREDUCE-207.v02.patch against trunk revision . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient: org.apache.hadoop.mapred.lib.aggregate.TestAggregates org.apache.hadoop.mapreduce.lib.db.TestDataDrivenDBInputFormat org.apache.hadoop.mapred.TestFieldSelection org.apache.hadoop.mapred.TestOldCombinerGrouping org.apache.hadoop.mapreduce.TestLocalRunner org.apache.hadoop.mapred.TestUserDefinedCounters org.apache.hadoop.mapreduce.TestMROutputFormat org.apache.hadoop.mapreduce.lib.fieldsel.TestMRFieldSelection org.apache.hadoop.mapred.TestLocalMRNotification org.apache.hadoop.mapred.TestLineRecordReaderJobs org.apache.hadoop.mapreduce.lib.map.TestMultithreadedMapper org.apache.hadoop.mapreduce.TestNewCombinerGrouping org.apache.hadoop.mapred.lib.TestChainMapReduce org.apache.hadoop.mapreduce.TestMapReduce org.apache.hadoop.mapreduce.lib.join.TestJoinDatamerge org.apache.hadoop.mapred.lib.TestKeyFieldBasedComparator org.apache.hadoop.mapred.lib.TestMultithreadedMapRunner org.apache.hadoop.mapreduce.TestMapperReducerCleanup org.apache.hadoop.mapred.lib.TestMultipleOutputs org.apache.hadoop.mapred.TestJavaSerialization org.apache.hadoop.mapreduce.lib.output.TestMRMultipleOutputs org.apache.hadoop.mapred.TestCollect org.apache.hadoop.mapred.join.TestDatamerge org.apache.hadoop.mapreduce.TestMapCollection org.apache.hadoop.mapreduce.lib.aggregate.TestMapReduceAggregates org.apache.hadoop.mapred.TestMapRed org.apache.hadoop.mapred.TestFileOutputFormat org.apache.hadoop.mapreduce.TestValueIterReset org.apache.hadoop.mapred.TestMapOutputType org.apache.hadoop.mapred.TestJobCounters org.apache.hadoop.conf.TestNoDefaultsJobConf org.apache.hadoop.mapred.TestReporter org.apache.hadoop.mapreduce.lib.partition.TestMRKeyFieldBasedComparator org.apache.hadoop.mapreduce.lib.chain.TestChainErrors org.apache.hadoop.mapreduce.lib.chain.TestSingleElementChain org.apache.hadoop.mapreduce.lib.input.TestMultipleInputs org.apache.hadoop.mapred.TestComparators org.apache.hadoop.mapreduce.lib.input.TestLineRecordReaderJobs org.apache.hadoop.mapreduce.lib.chain.TestMapReduceChain org.apache.hadoop.mapred.jobcontrol.TestLocalJobControl org.apache.hadoop.mapreduce.lib.jobcontrol.TestMapReduceJobControl The following test timeouts occurred in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient: org.apache.hadoop.mapreduce.v2.app.job.impl.TestTaskAttempt org.apache.hadoop.mapreduce.v2.app.job.impl.TestMapReduceChildJVM org.apache.hadoop.mapreduce.v2.app.webapp.TestAMWebApp org.apache.hadoop.mapreduce.v2.app.TestMRClientService org.apache.hadoop.mapreduce.v2.app.TestKill org.apache.hadoop.mapreduce.v2.app.TestMRApp org.apache.hadoop.mapreduce.v2.app.TestJobEndNotifier org.apache.hadoop.mapreduce.v2.app.TestFail org.apache.hadoop.mapreduce.v2.app.TestStagingCleanup org.apache.hadoop.mapreduce.v2.app.TestRMContainerAllocator org.apache.hadoop.mapreduce.v2.app.launcher.TestContainerLauncher org.apache.hadoop.mapreduce.v2.app.TestRecovery org.apache.hadoop.mapreduce.v2.app.TestAMInfos org.apache.hadoop.mapreduce.v2.app.TestMRAppComponentDependencies org.apache.hadoop.mapreduce.v2.app.TestFetchFailure org.apache.hadoop.mapreduce.v2.TestSpeculativeExecutionWithMRApp org.apache.hadoop.mapred.pipes.TestPipeApplication +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4595//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4595//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-app.html Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4595//console This message is automatically generated.
          Hide
          Gera Shegalov added a comment -

          v03 to handle local jobs correctly

          Show
          Gera Shegalov added a comment - v03 to handle local jobs correctly
          Hide
          Gera Shegalov added a comment -

          Steve Loughran, thanks for your comment in MAPREDUCE-5887. Moving it to here.

          One test to try there is what happens when the blocksize is reported as very, very small (you can configure this in swiftfs). in the client this will cause the submitting process to OOM and fail. Presumably the same outcome in the AM is the simplest to implement -we just need to make sure that YARN recognises this as a failure and only tries a couple of times

          OOM's as any other AM failure are treated as an Application attempt failure (yarn.resourcemanager.am.max-attempts). We've experienced such issues in production, and it is actually usually indirectly related to splits, i.e. the job state comprising all map and reduce attempts is too big for the default MR-AM container size.

          Before doing the work on moving split calculation to MR-AM, I was actually thinking about auto-tuning yarn.app.mapreduce.am.resource.mb and Xmx opts in JobSubmitter. However, even if the split calculation happens in AM, we can come up with an AM-RM RPC like "start a new attempt with the new settings".

          Show
          Gera Shegalov added a comment - Steve Loughran , thanks for your comment in MAPREDUCE-5887 . Moving it to here. One test to try there is what happens when the blocksize is reported as very, very small (you can configure this in swiftfs). in the client this will cause the submitting process to OOM and fail. Presumably the same outcome in the AM is the simplest to implement -we just need to make sure that YARN recognises this as a failure and only tries a couple of times OOM's as any other AM failure are treated as an Application attempt failure ( yarn.resourcemanager.am.max-attempts ). We've experienced such issues in production, and it is actually usually indirectly related to splits, i.e. the job state comprising all map and reduce attempts is too big for the default MR-AM container size. Before doing the work on moving split calculation to MR-AM, I was actually thinking about auto-tuning yarn.app.mapreduce.am.resource.mb and Xmx opts in JobSubmitter. However, even if the split calculation happens in AM, we can come up with an AM-RM RPC like "start a new attempt with the new settings".
          Hide
          Gera Shegalov added a comment -

          Hadoop QA did not kick in. Reuploading the same v03 again

          Show
          Gera Shegalov added a comment - Hadoop QA did not kick in. Reuploading the same v03 again
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12645924/MAPREDUCE-207.v03.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The following test timeouts occurred in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient:

          org.apache.hadoop.mapreduce.v2.app.TestJobEndNotifier
          org.apache.hadoop.mapreduce.v2.app.TestRecovery
          org.apache.hadoop.mapreduce.v2.app.TestMRAppComponentDependencies
          org.apache.hadoop.mapreduce.v2.app.TestMRApp
          org.apache.hadoop.mapreduce.v2.app.TestRMContainerAllocator
          org.apache.hadoop.mapreduce.v2.app.TestFail
          org.apache.hadoop.mapreduce.v2.app.TestFetchFailure
          org.apache.hadoop.mapreduce.v2.app.job.impl.TestTaskAttempt
          org.apache.hadoop.mapreduce.v2.app.job.impl.TestMapReduceChildJVM
          org.apache.hadoop.mapreduce.v2.app.TestMRClientService
          org.apache.hadoop.mapreduce.v2.app.TestAMInfos
          org.apache.hadoop.mapreduce.v2.app.webapp.TestAMWebApp
          org.apache.hadoop.mapreduce.v2.app.TestKill
          org.apache.hadoop.mapreduce.v2.app.TestStagingCleanup
          org.apache.hadoop.mapreduce.v2.app.launcher.TestContainerLauncher
          org.apache.hadoop.mapred.pipes.TestPipeApplication
          org.apache.hadoop.mapreduce.v2.TestSpeculativeExecutionWithMRApp

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4614//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4614//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-app.html
          Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4614//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12645924/MAPREDUCE-207.v03.patch against trunk revision . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The following test timeouts occurred in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient: org.apache.hadoop.mapreduce.v2.app.TestJobEndNotifier org.apache.hadoop.mapreduce.v2.app.TestRecovery org.apache.hadoop.mapreduce.v2.app.TestMRAppComponentDependencies org.apache.hadoop.mapreduce.v2.app.TestMRApp org.apache.hadoop.mapreduce.v2.app.TestRMContainerAllocator org.apache.hadoop.mapreduce.v2.app.TestFail org.apache.hadoop.mapreduce.v2.app.TestFetchFailure org.apache.hadoop.mapreduce.v2.app.job.impl.TestTaskAttempt org.apache.hadoop.mapreduce.v2.app.job.impl.TestMapReduceChildJVM org.apache.hadoop.mapreduce.v2.app.TestMRClientService org.apache.hadoop.mapreduce.v2.app.TestAMInfos org.apache.hadoop.mapreduce.v2.app.webapp.TestAMWebApp org.apache.hadoop.mapreduce.v2.app.TestKill org.apache.hadoop.mapreduce.v2.app.TestStagingCleanup org.apache.hadoop.mapreduce.v2.app.launcher.TestContainerLauncher org.apache.hadoop.mapred.pipes.TestPipeApplication org.apache.hadoop.mapreduce.v2.TestSpeculativeExecutionWithMRApp +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4614//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4614//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-app.html Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4614//console This message is automatically generated.
          Hide
          Gera Shegalov added a comment -

          v05 patch, to restore the existing behavior of not adding job.split as local resource for non-AM containers.

          Show
          Gera Shegalov added a comment - v05 patch, to restore the existing behavior of not adding job.split as local resource for non-AM containers.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12646848/MAPREDUCE-207.v05.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The following test timeouts occurred in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient:

          org.apache.hadoop.mapred.pipes.TestPipeApplication

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4624//testReport/
          Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4624//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12646848/MAPREDUCE-207.v05.patch against trunk revision . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The following test timeouts occurred in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient: org.apache.hadoop.mapred.pipes.TestPipeApplication +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4624//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4624//console This message is automatically generated.
          Hide
          Gera Shegalov added a comment -

          Assuming that TestPipeApplication is MAPREDUCE-5868, v05 is ready for review. The code can further be optimized to avoid reading splits back when they are written for the first time. We can incorporate it if the approach is accepted in general. There is plenty of coverage for job submission that helped shape the patch. Since it's mere refactoring, no new functional tests are urgently needed.

          Show
          Gera Shegalov added a comment - Assuming that TestPipeApplication is MAPREDUCE-5868 , v05 is ready for review. The code can further be optimized to avoid reading splits back when they are written for the first time. We can incorporate it if the approach is accepted in general. There is plenty of coverage for job submission that helped shape the patch. Since it's mere refactoring, no new functional tests are urgently needed.
          Hide
          Ming Ma added a comment -

          Thanks, Gera. Nice work and this will be quite useful. Overall it looks good. Per offline discussion with Gera,

          1. It is unclear if there is any security related implication such as https://issues.apache.org/jira/browse/MAPREDUCE-5663.
          2. The compatibility between new MR client with this feature and cluster with old MR. Given new MR client won't compute the split by default; the job will fail if the cluster still uses old MR. So in this case, new MR client needs to be configured to compute split. For a more general case where new MR client can talk to some cluster with old MR and some cluster with new MR, it will be nice if client can discover if the cluster supports this feature.

          Show
          Ming Ma added a comment - Thanks, Gera. Nice work and this will be quite useful. Overall it looks good. Per offline discussion with Gera, 1. It is unclear if there is any security related implication such as https://issues.apache.org/jira/browse/MAPREDUCE-5663 . 2. The compatibility between new MR client with this feature and cluster with old MR. Given new MR client won't compute the split by default; the job will fail if the cluster still uses old MR. So in this case, new MR client needs to be configured to compute split. For a more general case where new MR client can talk to some cluster with old MR and some cluster with new MR, it will be nice if client can discover if the cluster supports this feature.
          Hide
          Gera Shegalov added a comment -

          v06 that adds a unit test and fixes incorrect handling of the number of mappers other than 2 in a uberized job.

          Show
          Gera Shegalov added a comment - v06 that adds a unit test and fixes incorrect handling of the number of mappers other than 2 in a uberized job.
          Hide
          Gera Shegalov added a comment -

          v06 does not address Ming Ma's review yet (thank you) . Assigned this jira to myself as nobody else seems to be working on it.

          Show
          Gera Shegalov added a comment - v06 does not address Ming Ma 's review yet (thank you) . Assigned this jira to myself as nobody else seems to be working on it.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12653342/MAPREDUCE-207.v06.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient:

          org.apache.hadoop.mapred.TestMiniMRWithDFSWithDistinctUsers

          The following test timeouts occurred in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient:

          org.apache.hadoop.mapred.pipes.TestPipeApplication

          The test build failed in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4699//testReport/
          Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4699//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12653342/MAPREDUCE-207.v06.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient: org.apache.hadoop.mapred.TestMiniMRWithDFSWithDistinctUsers The following test timeouts occurred in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient: org.apache.hadoop.mapred.pipes.TestPipeApplication The test build failed in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4699//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4699//console This message is automatically generated.
          Hide
          Gera Shegalov added a comment -

          v07: disabling in-AM splits for GridMix because some InputFormats in this job use GridMixJob#descCache, which is on the client side

          Show
          Gera Shegalov added a comment - v07: disabling in-AM splits for GridMix because some InputFormats in this job use GridMixJob#descCache, which is on the client side
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12655331/MAPREDUCE-207.v07.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The test build failed in hadoop-tools/hadoop-gridmix

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4730//testReport/
          Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4730//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12655331/MAPREDUCE-207.v07.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The test build failed in hadoop-tools/hadoop-gridmix +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4730//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4730//console This message is automatically generated.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12655331/MAPREDUCE-207.v07.patch
          against trunk revision e1990ab.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5167//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12655331/MAPREDUCE-207.v07.patch against trunk revision e1990ab. -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5167//console This message is automatically generated.
          Hide
          Allen Wittenauer added a comment -

          Cancelling patch, as it no longer applies.

          Show
          Allen Wittenauer added a comment - Cancelling patch, as it no longer applies.
          Hide
          Vinod Kumar Vavilapalli added a comment -

          Moving features/enhancements out of previously closed releases into the next minor release 2.8.0.

          Show
          Vinod Kumar Vavilapalli added a comment - Moving features/enhancements out of previously closed releases into the next minor release 2.8.0.

            People

            • Assignee:
              Gera Shegalov
              Reporter:
              Philip Zeyliger
            • Votes:
              1 Vote for this issue
              Watchers:
              38 Start watching this issue

              Dates

              • Created:
                Updated:

                Development