Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-248

locating map outputs via random probing is inefficient

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.2.1
    • Fix Version/s: 0.12.0
    • Component/s: None
    • Labels:
      None

      Description

      Currently the ReduceTaskRunner polls the JobTracker for a random list of map tasks asking for their output locations. It would be better if the JobTracker kept an ordered log and the interface was changed to:

      class MapLocationResults {
      public int getTimestamp();
      public MapOutputLocation[] getLocations();
      }

      interface InterTrackerProtocol {
      ...
      MapLocationResults locateMapOutputs(int prevTimestamp);
      }

      with the intention that each time a ReduceTaskRunner calls locateMapOutputs, it passes back the "timestamp" that it got from the previous result. That way, reduces can easily find the new MapOutputs. This should help the "ramp up" when the maps first start finishing.

        Attachments

        1. 248-initial7.patch
          21 kB
          Devaraj Das
        2. 248-initial8.patch
          22 kB
          Devaraj Das
        3. 248-9.patch
          21 kB
          Owen O'Malley
        4. 248-fixed1.patch
          23 kB
          Devaraj Das

          Issue Links

            Activity

              People

              • Assignee:
                devaraj Devaraj Das
                Reporter:
                owen.omalley Owen O'Malley
              • Votes:
                0 Vote for this issue
                Watchers:
                0 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: