Hadoop Common
  1. Hadoop Common
  2. HADOOP-1042

Improve the handling of failed map output fetches

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.11.2
    • Fix Version/s: 0.12.0
    • Component/s: None
    • Labels:
      None

      Description

      Currently, whenever fetch of a map output fails the corresponding MapOutputLocation is added to a List datastructure for later retrial. But, if the failure was due to a lost task, the entry that was added is not deleted. For such cases, unnecessary retrials will happen. This situation should be prevented.

      1. 1042.new.patch
        6 kB
        Devaraj Das
      2. 1042.patch
        6 kB
        Devaraj Das

        Activity

        Hide
        Doug Cutting added a comment -

        I committed this yesterday. Thanks, Devaraj!

        Show
        Doug Cutting added a comment - I committed this yesterday. Thanks, Devaraj!
        Show
        Hadoop QA added a comment - +1, because http://issues.apache.org/jira/secure/attachment/12352150/1042.new.patch applied and successfully tested against trunk revision http://svn.apache.org/repos/asf/lucene/hadoop/trunk/512006 . Results are at http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch
        Hide
        Devaraj Das added a comment -

        Ok, this patch incorporates David's comment (replaced only one line with what David suggested since that was the only place where such a thing was done). Actually, there are lots of places in the codebase where similar warnings can be removed.

        Show
        Devaraj Das added a comment - Ok, this patch incorporates David's comment (replaced only one line with what David suggested since that was the only place where such a thing was done). Actually, there are lots of places in the codebase where similar warnings can be removed.
        Hide
        David Bowen added a comment -

        A small point: in lines like this:

        Map<Integer, MapOutputLocation> knownOutputs = new HashMap();

        it would be preferable (avoid a warning) to specify the types on the right hand side, like this:

        Map<Integer,MapOutputLocation> knownOutputs = new HashMap<Integer,MapOutputLocation>();

        Show
        David Bowen added a comment - A small point: in lines like this: Map<Integer, MapOutputLocation> knownOutputs = new HashMap(); it would be preferable (avoid a warning) to specify the types on the right hand side, like this: Map<Integer,MapOutputLocation> knownOutputs = new HashMap<Integer,MapOutputLocation>();
        Hide
        Devaraj Das added a comment -

        This patch does the following (everything in the file ReduceTaskRunner.java):
        1) Changes the datastructure of knownOutputs from List to Map. This eases replacing MapOutputLocation objects for the failed fetches (if the JobTracker later on gives us new locations for those mapIds)
        2) Changes ListIterator to Iterator (since it is not straightforward to get a ListIterator out of a Map and we anyway don't use the features of a ListIterator)
        3) Changes the order in which entries (mapId/MapOutputLocation objects) are added in the knownOutputs Map - first entries corresponding to failed fetches are added and then the new entries (got from JobTracker) are added. This will ensure that the new entries overwrite the old (failed) entries (for the same mapId hashkeys).
        4) Removes the call to Collections.shuffle( ) and the associated Random object. Since the randomness for fetching map outputs is not there anymore, we don't need this.
        5) queryJobTracker now returns a List <MapOutputLocation> instead of an array of MapOutputLocation.

        Show
        Devaraj Das added a comment - This patch does the following (everything in the file ReduceTaskRunner.java): 1) Changes the datastructure of knownOutputs from List to Map. This eases replacing MapOutputLocation objects for the failed fetches (if the JobTracker later on gives us new locations for those mapIds) 2) Changes ListIterator to Iterator (since it is not straightforward to get a ListIterator out of a Map and we anyway don't use the features of a ListIterator) 3) Changes the order in which entries (mapId/MapOutputLocation objects) are added in the knownOutputs Map - first entries corresponding to failed fetches are added and then the new entries (got from JobTracker) are added. This will ensure that the new entries overwrite the old (failed) entries (for the same mapId hashkeys). 4) Removes the call to Collections.shuffle( ) and the associated Random object. Since the randomness for fetching map outputs is not there anymore, we don't need this. 5) queryJobTracker now returns a List <MapOutputLocation> instead of an array of MapOutputLocation.

          People

          • Assignee:
            Devaraj Das
            Reporter:
            Devaraj Das
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development