Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-2405

MR-279: Implement uber-AppMaster (in-cluster LocalJobRunner for MRv2)

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.23.0
    • Component/s: mrv2
    • Labels:
      None
    • Release Note:
      An efficient implementation of small jobs by running all tasks in the MR ApplicationMaster JVM, there-by affecting lower latency.

      Description

      "Port" MAPREDUCE-1220 to MRv2. This is an optimization for small jobs wherein all tasks run on the same node in the same JVM/container.

        Issue Links

          Activity

          Hide
          Greg Roelofs added a comment -

          As with MAPREDUCE-1220's "UberTask", which essentially runs small jobs sequentially within a single Task (and therefore in a single JVM), the MRv2 version does so within an "UberAppMaster"--which is really just the regular MRAppMaster with two container services overridden (allocator, launcher). The analogue to UberTask's run() method is LocalContainerLauncher's SubtaskRunner.run(), which hooks into the state machines and executes the subtasks sequentially.

          This design is much cleaner than that in MR-1220 since the subtasks are "real" and can communicate directly with external entities, and the uber-AM is the regular AM and requires no special UI handling (though we'll want to flag uberized AMs and jobs in the UI somehow). UberTask, on the other hand, had to translate its subtasks' status-updates and was very awkward to integrate with the UI (is it a ReduceTask or an UberTask? – the cascade of required changes was huge and never completed).

          Kudos go to Sharad Agarwal for this design.

          Show
          Greg Roelofs added a comment - As with MAPREDUCE-1220 's "UberTask", which essentially runs small jobs sequentially within a single Task (and therefore in a single JVM), the MRv2 version does so within an "UberAppMaster"--which is really just the regular MRAppMaster with two container services overridden (allocator, launcher). The analogue to UberTask's run() method is LocalContainerLauncher's SubtaskRunner.run(), which hooks into the state machines and executes the subtasks sequentially. This design is much cleaner than that in MR-1220 since the subtasks are "real" and can communicate directly with external entities, and the uber-AM is the regular AM and requires no special UI handling (though we'll want to flag uberized AMs and jobs in the UI somehow). UberTask, on the other hand, had to translate its subtasks' status-updates and was very awkward to integrate with the UI (is it a ReduceTask or an UberTask? – the cascade of required changes was huge and never completed). Kudos go to Sharad Agarwal for this design.
          Hide
          Greg Roelofs added a comment -

          Preliminary patch: still full of debug noise, FIXMEs, TODOs, etc., but reasonably functional at the level of current testing.

          This passes all current unit tests except testFailingMapper() in TestMRJobs, and even that one basically works except for an Avro NPE after the job completes and the AM shuts down.

          Show
          Greg Roelofs added a comment - Preliminary patch: still full of debug noise, FIXMEs, TODOs, etc., but reasonably functional at the level of current testing. This passes all current unit tests except testFailingMapper() in TestMRJobs, and even that one basically works except for an Avro NPE after the job completes and the AM shuts down.
          Hide
          Sharad Agarwal added a comment -

          The overall direction looks good.

          Noticed in JobImpl:

          
          .addTransition(JobState.INITED, JobState.KILL_WAIT,
                        JobEventType.JOB_KILL,
                        KILL_NEW_JOB_TRANSITION)
          

          Kill event on INITED state should directly get to KILLED state. Also it should not use KILL_NEW_JOB_TRANSITION because it does not call the abort logic. Here we need a new Transition which executes abort.

          Show
          Sharad Agarwal added a comment - The overall direction looks good. Noticed in JobImpl: .addTransition(JobState.INITED, JobState.KILL_WAIT, JobEventType.JOB_KILL, KILL_NEW_JOB_TRANSITION) Kill event on INITED state should directly get to KILLED state. Also it should not use KILL_NEW_JOB_TRANSITION because it does not call the abort logic. Here we need a new Transition which executes abort.
          Hide
          Greg Roelofs added a comment -

          De-personalized, cleaned up, and a better unit test.

          Unfortunately, the git mirror is hosed, so I can't merge with the recently committed MAPREDUCE-2414 (i.e., this won't apply). And I just remembered that I forgot to add boilerplate to the three or four new files in here. But other than those caveats (and keeping in mind that this is a first cut, as previously noted), this is ready for pre-commit review.

          Show
          Greg Roelofs added a comment - De-personalized, cleaned up, and a better unit test. Unfortunately, the git mirror is hosed, so I can't merge with the recently committed MAPREDUCE-2414 (i.e., this won't apply). And I just remembered that I forgot to add boilerplate to the three or four new files in here. But other than those caveats (and keeping in mind that this is a first cut, as previously noted), this is ready for pre-commit review.
          Hide
          Greg Roelofs added a comment -

          Merged with MAPREDUCE-2414, boilerplate added, additional minor cleanups. This should be ready for commit to the MR-279 branch.

          Show
          Greg Roelofs added a comment - Merged with MAPREDUCE-2414 , boilerplate added, additional minor cleanups. This should be ready for commit to the MR-279 branch.
          Hide
          Mahadev konar added a comment -

          I just pushed this to MR-279 branch. thanks greg.

          Show
          Mahadev konar added a comment - I just pushed this to MR-279 branch. thanks greg.
          Hide
          Greg Roelofs added a comment -

          Thanks, Mahadev! You're clearly a person of outstanding character, intelligence, and better-than-average looks.

          (I'll just note here that there's further work/fixes to be done, particularly including counters, UI, the memory and input-size uber-decision criteria, and AM-restart on task-attempt failures, but I'll file a follow-up JIRA for that.)

          Show
          Greg Roelofs added a comment - Thanks, Mahadev! You're clearly a person of outstanding character, intelligence, and better-than-average looks. (I'll just note here that there's further work/fixes to be done, particularly including counters, UI, the memory and input-size uber-decision criteria, and AM-restart on task-attempt failures, but I'll file a follow-up JIRA for that.)

            People

            • Assignee:
              Greg Roelofs
              Reporter:
              Mahadev konar
            • Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development