Hadoop Common
  1. Hadoop Common
  2. HADOOP-2208

Reduce frequency of Counter updates in the task tracker status

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.16.0
    • Component/s: None
    • Labels:
      None

      Description

      Currently, We have counter updates from task tracker to job tracker on every heartbeat. Both counter name and the values are updated for every heartbeat. This can be improved by sending names and values for the first time and only the values after that.
      The frequency can be reduced by doing update only when the counters got changed.

      1. patch-2208.txt
        6 kB
        Amareshwari Sriramadasu
      2. patch-2208.txt
        10 kB
        Amareshwari Sriramadasu
      3. patch-2208.txt
        10 kB
        Amareshwari Sriramadasu
      4. patch-2208.txt
        10 kB
        Amareshwari Sriramadasu
      5. patch-2208.txt
        11 kB
        Amareshwari Sriramadasu

        Activity

        Hide
        Arun C Murthy added a comment -

        One really simple option is to to add a Counters.clear method and call it from TaskStatus.clearStatus, this will ensure that only updated counters are sent in every hearbeat.

        Show
        Arun C Murthy added a comment - One really simple option is to to add a Counters.clear method and call it from TaskStatus.clearStatus , this will ensure that only updated counters are sent in every hearbeat.
        Hide
        Arun C Murthy added a comment - - edited

        One really simple option is to to add a Counters.clear method and call it from TaskStatus.clearStatus, this will ensure that only updated counters are sent in every hearbeat.

        I jumped to a hasty conclusion. Forget it, won't work.

        The way to get that to work is to add a boolean updated flag to CounterRec, set that to true on every Counter.incrCounter call and check and send only updated ones. The updated flag should be cleared via TaskStatus.clearStatus. Thoughts?

        Show
        Arun C Murthy added a comment - - edited One really simple option is to to add a Counters.clear method and call it from TaskStatus.clearStatus, this will ensure that only updated counters are sent in every hearbeat. I jumped to a hasty conclusion. Forget it, won't work. The way to get that to work is to add a boolean updated flag to CounterRec , set that to true on every Counter.incrCounter call and check and send only updated ones. The updated flag should be cleared via TaskStatus.clearStatus . Thoughts?
        Hide
        Owen O'Malley added a comment -

        Actually, I would also cap sending the counters to once every 10 seconds and when the task changes state. That way, if you send an extra heartbeat to get a new task, you won't send the counters.

        Show
        Owen O'Malley added a comment - Actually, I would also cap sending the counters to once every 10 seconds and when the task changes state. That way, if you send an extra heartbeat to get a new task, you won't send the counters.
        Hide
        Amareshwari Sriramadasu added a comment -

        The attached patch sends updated counters. We mark counters as updated when they are created with a value or an increment operation is done. And sending the counters is capped at 1min.

        Show
        Amareshwari Sriramadasu added a comment - The attached patch sends updated counters. We mark counters as updated when they are created with a value or an increment operation is done. And sending the counters is capped at 1min.
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12371357/patch-2208.txt
        against trunk revision r602790.

        @author +1. The patch does not contain any @author tags.

        javadoc +1. The javadoc tool did not generate any warning messages.

        javac +1. The applied patch does not generate any new compiler warnings.

        findbugs +1. The patch does not introduce any new Findbugs warnings.

        core tests +1. The patch passed core unit tests.

        contrib tests +1. The patch passed contrib unit tests.

        Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1306/testReport/
        Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1306/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1306/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1306/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12371357/patch-2208.txt against trunk revision r602790. @author +1. The patch does not contain any @author tags. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new compiler warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1306/testReport/ Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1306/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1306/artifact/trunk/build/test/checkstyle-errors.html Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1306/console This message is automatically generated.
        Hide
        Devaraj Das added a comment -

        Sorry, this patch doesn't apply anymore. Could you please regenerate this?

        Show
        Devaraj Das added a comment - Sorry, this patch doesn't apply anymore. Could you please regenerate this?
        Hide
        Amareshwari Sriramadasu added a comment -

        patch in sync with trunk.

        Show
        Amareshwari Sriramadasu added a comment - patch in sync with trunk.
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12371778/patch-2208.txt
        against trunk revision r604451.

        @author +1. The patch does not contain any @author tags.

        javadoc +1. The javadoc tool did not generate any warning messages.

        javac +1. The applied patch does not generate any new compiler warnings.

        findbugs +1. The patch does not introduce any new Findbugs warnings.

        core tests +1. The patch passed core unit tests.

        contrib tests +1. The patch passed contrib unit tests.

        Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1362/testReport/
        Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1362/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1362/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1362/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12371778/patch-2208.txt against trunk revision r604451. @author +1. The patch does not contain any @author tags. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new compiler warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1362/testReport/ Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1362/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1362/artifact/trunk/build/test/checkstyle-errors.html Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1362/console This message is automatically generated.
        Hide
        Arun C Murthy added a comment -

        I see a potential bug with this patch: the TaskTracker caches counters sent by the child-task till it's sent out to the JobTracker via the heartbeat. Hence, we need to merge the ones received from the child in TaskStatus.statusUpdate, not just over-write them as-is today.

        Minor nit: the merge of the counters probably belongs to TaskInProgress.recomputeProgress rather than TaskInProgress.updateStatus ...

        Show
        Arun C Murthy added a comment - I see a potential bug with this patch: the TaskTracker caches counters sent by the child-task till it's sent out to the JobTracker via the heartbeat. Hence, we need to merge the ones received from the child in TaskStatus.statusUpdate , not just over-write them as-is today. Minor nit: the merge of the counters probably belongs to TaskInProgress.recomputeProgress rather than TaskInProgress.updateStatus ...
        Hide
        Amareshwari Sriramadasu added a comment -

        Submiting patch after incorporating review comments

        Show
        Amareshwari Sriramadasu added a comment - Submiting patch after incorporating review comments
        Hide
        Devaraj Das added a comment -

        Some comments: You don't have to call setSendCounters in JobInProgress.java, and, you probably should rename the APIs getSendCounters and setSendCounters to something more intuitive.

        Show
        Devaraj Das added a comment - Some comments: You don't have to call setSendCounters in JobInProgress.java, and, you probably should rename the APIs getSendCounters and setSendCounters to something more intuitive.
        Hide
        Amareshwari Sriramadasu added a comment -

        Submit again after incorporating Devaraj's comments.

        Show
        Amareshwari Sriramadasu added a comment - Submit again after incorporating Devaraj's comments.
        Hide
        Owen O'Malley added a comment -

        I'm pretty worried about the approach of this patch. It takes it from always sending the current values for the counters to just sending the ones that changed. That doesn't seem like an optimization that is likely to be important. Have you run large jobs that show this is important? My concern is that sending the deltas makes the system very vulnerable to losing or duplicating a message. My preference would be to have a boolean in the TaskStatus whether it should be sending the counters or not, but always send the current values of all counters.

        I'd also recommend against the current sendCounters and doSendCounters. I think your original names were better:

        {get,set}SendCounters. Maybe they should be something like: {get,set}

        IncludeCounters...

        Show
        Owen O'Malley added a comment - I'm pretty worried about the approach of this patch. It takes it from always sending the current values for the counters to just sending the ones that changed. That doesn't seem like an optimization that is likely to be important. Have you run large jobs that show this is important? My concern is that sending the deltas makes the system very vulnerable to losing or duplicating a message. My preference would be to have a boolean in the TaskStatus whether it should be sending the counters or not, but always send the current values of all counters. I'd also recommend against the current sendCounters and doSendCounters. I think your original names were better: {get,set}SendCounters. Maybe they should be something like: {get,set} IncludeCounters...
        Hide
        Arun C Murthy added a comment -

        Cancelling patch while Owen's feedback gets incorporated...

        Show
        Arun C Murthy added a comment - Cancelling patch while Owen's feedback gets incorporated...
        Hide
        Amareshwari Sriramadasu added a comment -

        submiting patch after incorporating Owen's comments

        Show
        Amareshwari Sriramadasu added a comment - submiting patch after incorporating Owen's comments
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12372245/patch-2208.txt
        against trunk revision r607131.

        @author +1. The patch does not contain any @author tags.

        javadoc +1. The javadoc tool did not generate any warning messages.

        javac +1. The applied patch does not generate any new compiler warnings.

        findbugs +1. The patch does not introduce any new Findbugs warnings.

        core tests +1. The patch passed core unit tests.

        contrib tests -1. The patch failed contrib unit tests.

        Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1426/testReport/
        Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1426/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1426/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1426/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12372245/patch-2208.txt against trunk revision r607131. @author +1. The patch does not contain any @author tags. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new compiler warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests -1. The patch failed contrib unit tests. Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1426/testReport/ Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1426/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1426/artifact/trunk/build/test/checkstyle-errors.html Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1426/console This message is automatically generated.
        Hide
        Devaraj Das added a comment -

        I just committed this. Thanks, Amareshwari!

        Show
        Devaraj Das added a comment - I just committed this. Thanks, Amareshwari!

          People

          • Assignee:
            Amareshwari Sriramadasu
            Reporter:
            Amareshwari Sriramadasu
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development