Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Duplicate
    • Affects Version/s: None
    • Fix Version/s: 0.23.0
    • Component/s: client, jobtracker
    • Labels:
      None
    • Release Note:
      An efficient implementation of small jobs by running all tasks in the same JVM, there-by effecting lower latency.

      Description

      Currently very small map-reduce jobs suffer from latency issues due to overheads in Hadoop Map-Reduce such as scheduling, jvm startup etc. We've periodically tried to optimize all parts of framework to achieve lower latencies.

      I'd like to turn the problem around a little bit. I propose we allow very small jobs to run as a single task job with multiple maps and reduces i.e. similar to our current implementation of the LocalJobRunner. Thus, under certain conditions (maybe user-set configuration, or if input data is small i.e. less a DFS blocksize) we could launch a special task which will run all maps in a serial manner, followed by the reduces. This would really help small jobs achieve significantly smaller latencies, thanks to lesser scheduling overhead, jvm startup, lack of shuffle over the network etc.

      This would be a huge benefit, especially on large clusters, to small Hive/Pig queries.

      Thoughts?

        Issue Links

          Activity

          Arun C Murthy created issue -
          Arun C Murthy made changes -
          Field Original Value New Value
          Attachment MAPREDUCE-1220_yhadoop20.patch [ 12435542 ]
          Greg Roelofs made changes -
          Greg Roelofs made changes -
          Assignee Arun C Murthy [ acmurthy ] Greg Roelofs [ roelofs ]
          Greg Roelofs made changes -
          Nigel Daley made changes -
          Fix Version/s 0.22.0 [ 12314184 ]
          Greg Roelofs made changes -
          Greg Roelofs made changes -
          Greg Roelofs made changes -
          Greg Roelofs made changes -
          Greg Roelofs made changes -
          Greg Roelofs made changes -
          Greg Roelofs made changes -
          Greg Roelofs made changes -
          Greg Roelofs made changes -
          Greg Roelofs made changes -
          Attachment MR-1220.v2b.sshot-01-jobtracker.jsp.png [ 12473073 ]
          Greg Roelofs made changes -
          Attachment MR-1220.v1b.sshot-02-jobdetails.jsp.png [ 12473074 ]
          Greg Roelofs made changes -
          Greg Roelofs made changes -
          Link This issue is related to MAPREDUCE-2405 [ MAPREDUCE-2405 ]
          Arun C Murthy made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Release Note An efficient implementation of small jobs by running all tasks in the same JVM, there-by affecting lower latency.
          Fix Version/s 0.23.0 [ 12315570 ]
          Resolution Fixed [ 1 ]
          Arun C Murthy made changes -
          Resolution Fixed [ 1 ]
          Status Resolved [ 5 ] Reopened [ 4 ]
          Arun C Murthy made changes -
          Status Reopened [ 4 ] Resolved [ 5 ]
          Resolution Duplicate [ 3 ]
          Todd Lipcon made changes -
          Release Note An efficient implementation of small jobs by running all tasks in the same JVM, there-by affecting lower latency. An efficient implementation of small jobs by running all tasks in the same JVM, there-by effecting lower latency.
          Arun C Murthy made changes -
          Status Resolved [ 5 ] Closed [ 6 ]

            People

            • Assignee:
              Greg Roelofs
              Reporter:
              Arun C Murthy
            • Votes:
              3 Vote for this issue
              Watchers:
              35 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development