Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-2424

MR-279: counters/UI/etc. for uber-AppMaster (in-cluster LocalJobRunner for MRv2)

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.23.0
    • Component/s: mrv2
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Polish uber-AM (MAPREDUCE-2405). Specifically:

      • uber-specific counters ("command-line UI")
      • GUI indicators
        • RM all-containers level
        • multi-job app level [if exists]
        • single-job level
      • fix uber-decision ("is this a small job?"):
        • memory criterion
        • input-bytes criterion
      • disable speculation
      • isUber() method (somewhere) for unit tests to use
      • delete (most of) old UberTask code (MAPREDUCE-1220; came in with initial MR-279 branch)
      • implement non-RPC, local version of umbilical
      • AM restart (default 4 tries) on another node on any task-attempt failure
      • uber-specific metrics?
      • rename configurables? (still "ubertask"-based)

        Issue Links

          Activity

          Hide
          Greg Roelofs added a comment -

          Thanks, Mahadev!

          We discussed the laundry list offline; each one really should be a separate JIRA, but I don't have the time to do that at the moment. The (now committed) patch above addressed only the first item and the final third of the second item. Which is to say, the following remain to be done:

          • GUI indicators
            • RM all-containers level
            • multi-job app level [if exists]
          • fix uber-decision ("is this a small job?"):
            • memory criterion
            • input-bytes criterion
          • disable speculation
          • isUber() method (somewhere) for unit tests to use
          • delete (most of) old UberTask code (MAPREDUCE-1220; came in with initial MR-279 branch)
          • implement a non-RPC, local version of umbilical
          • AM restart (default 4 tries) on another node on any task-attempt failure
          • uber-specific metrics?
          • rename configurables? (still "ubertask"-based)

          Mahadev also indicated a desire to put the subtasks back into their own JVMs (for better isolation), assuming the performance cost isn't too great. And as noted above, a better approach for the counters would be job-level counters (not task-level as implemented), since uber-AM is really a job-level feature.

          Show
          Greg Roelofs added a comment - Thanks, Mahadev! We discussed the laundry list offline; each one really should be a separate JIRA, but I don't have the time to do that at the moment. The (now committed) patch above addressed only the first item and the final third of the second item. Which is to say, the following remain to be done: GUI indicators RM all-containers level multi-job app level [if exists] fix uber-decision ("is this a small job?"): memory criterion input-bytes criterion disable speculation isUber() method (somewhere) for unit tests to use delete (most of) old UberTask code ( MAPREDUCE-1220 ; came in with initial MR-279 branch) implement a non-RPC, local version of umbilical AM restart (default 4 tries) on another node on any task-attempt failure uber-specific metrics? rename configurables? (still "ubertask"-based) Mahadev also indicated a desire to put the subtasks back into their own JVMs (for better isolation), assuming the performance cost isn't too great. And as noted above, a better approach for the counters would be job-level counters (not task-level as implemented), since uber-AM is really a job-level feature.
          Hide
          Mahadev konar added a comment -

          I just committed this. thanks greg!

          Show
          Mahadev konar added a comment - I just committed this. thanks greg!
          Hide
          Greg Roelofs added a comment -

          Same as v1 except with new "Uberized: true/false" line in the header block of the single-job GUI page. (Many thanks to Luke Lu for basically coding the GUI change verbally. )

          The multi-job RM page (i.e., marking each job entry as either uberized or not for a quick overview) is likely to be much harder. Unlike the original UberTask approach, here the decision to uberize simply changes some internal state within MRAppMaster after it's already running, so one would need some sort of protocol for an AM to pass either state info or maybe just a descriptive string to the RM. (Note that the latter would have some security implications; the string would need to be sanitized and perhaps truncated at 40 [Unicode] characters or something to avoid DoS issues. On the other hand, it's probably hard to come up with a sufficiently general state mechanism that could accommodate future non-MR AMs without going full Avro/PB self-descriptive.)

          Show
          Greg Roelofs added a comment - Same as v1 except with new "Uberized: true/false" line in the header block of the single-job GUI page. (Many thanks to Luke Lu for basically coding the GUI change verbally. ) The multi-job RM page (i.e., marking each job entry as either uberized or not for a quick overview) is likely to be much harder. Unlike the original UberTask approach, here the decision to uberize simply changes some internal state within MRAppMaster after it's already running, so one would need some sort of protocol for an AM to pass either state info or maybe just a descriptive string to the RM. (Note that the latter would have some security implications; the string would need to be sanitized and perhaps truncated at 40 [Unicode] characters or something to avoid DoS issues. On the other hand, it's probably hard to come up with a sufficiently general state mechanism that could accommodate future non-MR AMs without going full Avro/PB self-descriptive.)
          Hide
          Greg Roelofs added a comment -

          UI part 1: counters.

          This is implemented as task-level counters since job-level counters aren't implemented yet and will (I believe) require a new Job state-machine transition and self-arcs on multiple Job states to handle the event in all of the desirable cases. I added a comment to that effect in JobImpl but will leave the state-machine details to Sharad or somebody with more experience there.

          I'll try to get at least some of the GUI changes uploaded by tomorrow.

          Show
          Greg Roelofs added a comment - UI part 1: counters. This is implemented as task-level counters since job-level counters aren't implemented yet and will (I believe) require a new Job state-machine transition and self-arcs on multiple Job states to handle the event in all of the desirable cases. I added a comment to that effect in JobImpl but will leave the state-machine details to Sharad or somebody with more experience there. I'll try to get at least some of the GUI changes uploaded by tomorrow.

            People

            • Assignee:
              Greg Roelofs
              Reporter:
              Greg Roelofs
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development