Hadoop Common
  1. Hadoop Common
  2. HADOOP-1105

Reducers don't make "progress" while iterating through values

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.12.0
    • Fix Version/s: 0.12.3
    • Component/s: None
    • Labels:
      None

      Description

      Reduces make progress when they go to a new key, but not when they read the next value, which could cause reduces to time out when they have a lot of values for the same key.

      1. 1105.new1.patch
        10 kB
        Devaraj Das
      2. 1105.new1.patch
        4 kB
        Devaraj Das
      3. 1105.patch
        5 kB
        Devaraj Das
      4. 1105.patch
        3 kB
        Devaraj Das
      5. 1105-3.patch
        6 kB
        Owen O'Malley
      6. 1105-4.patch
        6 kB
        Owen O'Malley

        Activity

        Hide
        Devaraj Das added a comment -

        I am attaching a patch for this. Please review it, and if it seems ok, please schedule it for the 0.12.3 release. It will really help in alleviating the "Task failed to report status for X seconds .. killing" failures.

        Show
        Devaraj Das added a comment - I am attaching a patch for this. Please review it, and if it seems ok, please schedule it for the 0.12.3 release. It will really help in alleviating the "Task failed to report status for X seconds .. killing" failures.
        Hide
        Nigel Daley added a comment -

        Adding to 0.12.3 queue as suggested by Devaraj. If others disagree, kick it back to 0.13.0

        Show
        Nigel Daley added a comment - Adding to 0.12.3 queue as suggested by Devaraj. If others disagree, kick it back to 0.13.0
        Hide
        Devaraj Das added a comment -

        This patch makes some fields volatile (the fields that reportProgress uses, since reportProgress happens in a separate thread), instead of making the "synchronized" call at the ReduceTask. The main change here (in the patches) is that reportProgress is not done as part of every invocation of reducer(key, value[]); instead, the progress field is just set, and the thread does the actual job of reporting those to the tasktracker. This saves the overhead of making RPC connections to a overloaded/slow TaskTracker inline with the reducer method invocations. Ditto for Reporter.setStatus call (in Task.java). It should improve the reducer performance overall.

        Show
        Devaraj Das added a comment - This patch makes some fields volatile (the fields that reportProgress uses, since reportProgress happens in a separate thread), instead of making the "synchronized" call at the ReduceTask. The main change here (in the patches) is that reportProgress is not done as part of every invocation of reducer(key, value[]); instead, the progress field is just set, and the thread does the actual job of reporting those to the tasktracker. This saves the overhead of making RPC connections to a overloaded/slow TaskTracker inline with the reducer method invocations. Ditto for Reporter.setStatus call (in Task.java). It should improve the reducer performance overall.
        Hide
        David Bowen added a comment -

        I don't think it is useful to use volatile in these two cases:

        private volatile Progress taskProgress = new Progress();
        private volatile Counters counters = new Counters();

        These would be useful if one thread was assigning to the variables taskProgress and counters, and another reading them. What we have instead is one thread calling updater methods on the objects, and another calling getters.

        I think the threading issues need to be handled in the objects themselves. Counters already aims to be thread-safe. If you don't want to put synchroniztion in Progress, you might want to make its status, progress and currentPhase fields volatile.

        Show
        David Bowen added a comment - I don't think it is useful to use volatile in these two cases: private volatile Progress taskProgress = new Progress(); private volatile Counters counters = new Counters(); These would be useful if one thread was assigning to the variables taskProgress and counters, and another reading them. What we have instead is one thread calling updater methods on the objects, and another calling getters. I think the threading issues need to be handled in the objects themselves. Counters already aims to be thread-safe. If you don't want to put synchroniztion in Progress, you might want to make its status, progress and currentPhase fields volatile.
        Hide
        Devaraj Das added a comment -

        Thanks David for the comment. Attached is the updated patch.

        Show
        Devaraj Das added a comment - Thanks David for the comment. Attached is the updated patch.
        Hide
        Hadoop QA added a comment -

        -1, because 3 attempts failed to build and test the latest attachment http://issues.apache.org/jira/secure/attachment/12354774/1105.new1.patch against trunk revision http://svn.apache.org/repos/asf/lucene/hadoop/trunk/524903. Please note that this message is automatically generated and may represent a problem with the automation system and not the patch. Results are at http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch

        Show
        Hadoop QA added a comment - -1, because 3 attempts failed to build and test the latest attachment http://issues.apache.org/jira/secure/attachment/12354774/1105.new1.patch against trunk revision http://svn.apache.org/repos/asf/lucene/hadoop/trunk/524903 . Please note that this message is automatically generated and may represent a problem with the automation system and not the patch. Results are at http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch
        Hide
        Owen O'Malley added a comment -

        This patch does the wrong thing:
        1. The new thread name is really long (minor complaint of course).
        2. The reduce will never get killed for failing to make progress, because the progress thread will always call progress.
        3. It removes the calls to progress that were already there and fails to add the necessary new one.

        Show
        Owen O'Malley added a comment - This patch does the wrong thing: 1. The new thread name is really long (minor complaint of course). 2. The reduce will never get killed for failing to make progress, because the progress thread will always call progress. 3. It removes the calls to progress that were already there and fails to add the necessary new one.
        Hide
        Devaraj Das added a comment -

        Thanks Owen for pointing out the flaw with the patch!! I was too concentrated on the performance and lost track of the functionality. Anyway here's the updated patch. Some main points about the patch:
        1) Introduces a way for the progress reporting thread to block if there is no status to report. The calls to informReduceProgress wakes up the thread if it was blocked.
        2) Makes sure that the progress reporting thread is killed before the task exits. For this finally clause has been introduced in run(conf, umbilical). This caused quite a lot of indentation changes (making the patch seemingly complicated, sigh).
        3) Overides the method getReporter from Task.java. This is because in ReduceTask, setStatus behaves slightly differently. This ensures that MapTask's Reporter object behaves as it used to earlier (as an aside, the MapTask's Reporter.setStatus also needs to be tweaked on similar lines as the ReduceTask's, but think it could be a separate issue by itself).

        Show
        Devaraj Das added a comment - Thanks Owen for pointing out the flaw with the patch!! I was too concentrated on the performance and lost track of the functionality. Anyway here's the updated patch. Some main points about the patch: 1) Introduces a way for the progress reporting thread to block if there is no status to report. The calls to informReduceProgress wakes up the thread if it was blocked. 2) Makes sure that the progress reporting thread is killed before the task exits. For this finally clause has been introduced in run(conf, umbilical). This caused quite a lot of indentation changes (making the patch seemingly complicated, sigh). 3) Overides the method getReporter from Task.java. This is because in ReduceTask, setStatus behaves slightly differently. This ensures that MapTask's Reporter object behaves as it used to earlier (as an aside, the MapTask's Reporter.setStatus also needs to be tweaked on similar lines as the ReduceTask's, but think it could be a separate issue by itself).
        Hide
        Owen O'Malley added a comment -

        I think that we should limit this patch to just the bugfix to prevent any destabilization of the 12 line. I would like to create a separate issue that asks for progress reporting to be done in a separate thread.

        Show
        Owen O'Malley added a comment - I think that we should limit this patch to just the bugfix to prevent any destabilization of the 12 line. I would like to create a separate issue that asks for progress reporting to be done in a separate thread.
        Show
        Hadoop QA added a comment - +1, because http://issues.apache.org/jira/secure/attachment/12354893/1105-3.patch applied and successfully tested against trunk revision http://svn.apache.org/repos/asf/lucene/hadoop/trunk/525290 . Results are at http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch
        Hide
        Owen O'Malley added a comment -

        Devaraj pointed out that I shouldn't have added IOException to the throws clause of informReduceProgress, although in reality it can't be thrown by the current code.

        Show
        Owen O'Malley added a comment - Devaraj pointed out that I shouldn't have added IOException to the throws clause of informReduceProgress, although in reality it can't be thrown by the current code.
        Hide
        Owen O'Malley added a comment -

        This version of the patch removes the IOException from informReduceProgress.

        Show
        Owen O'Malley added a comment - This version of the patch removes the IOException from informReduceProgress.
        Hide
        Tom White added a comment -

        I've just committed this. Thanks Devaraj and Owen!

        Show
        Tom White added a comment - I've just committed this. Thanks Devaraj and Owen!
        Hide
        Hadoop QA added a comment -
        Show
        Hadoop QA added a comment - Integrated in Hadoop-Nightly #47 (See http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/47/ )
        Hide
        Tom White added a comment -

        Merged into 0.12 branch in revision 525761.

        Show
        Tom White added a comment - Merged into 0.12 branch in revision 525761.

          People

          • Assignee:
            Owen O'Malley
            Reporter:
            Owen O'Malley
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development