Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-2124

Add job counters for measuring time spent in three different phases in reducers

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Minor Minor
    • Resolution: Unresolved
    • Affects Version/s: 0.22.0
    • Fix Version/s: None
    • Component/s: jobtracker
    • Labels:
      None

      Description

      We currently have SLOTS_MILLIS_REDUCES which measures the total slot time of reducer.
      It will be useful if we have

      SLOTS_MILLIS_REDUCES_COPY
      SLOTS_MILLIS_REDUCES_SORT
      SLOTS_MILLIS_REDUCES_REDUCE
      

      which measures three different phases of a reducer.
      This will help us identify the bottleneck of the reducers.

      1. MAPREDUCE-2124.txt
        2 kB
        Scott Chen
      2. MAPREDUCE-2124-v2.txt
        4 kB
        Scott Chen
      3. test-result.txt
        54 kB
        Scott Chen
      4. MAPREDUCE-2124.3.txt
        4 kB
        Scott Chen

        Activity

        Hide
        Chris Douglas added a comment -

        This will help us identify the bottleneck of the reducers.

        Will it? The shuffle phase can be delayed by the map scheduling, the sort phase is often a noop, and the reduce phase takes whatever time it requires. Moreover, these times are also recorded in the job history. Wouldn't parsing that- or just using Rumen to do the analysis- be sufficient? The cost of additional framework counters in the JobTracker is pretty high...

        Show
        Chris Douglas added a comment - This will help us identify the bottleneck of the reducers. Will it? The shuffle phase can be delayed by the map scheduling, the sort phase is often a noop, and the reduce phase takes whatever time it requires. Moreover, these times are also recorded in the job history. Wouldn't parsing that- or just using Rumen to do the analysis- be sufficient? The cost of additional framework counters in the JobTracker is pretty high...
        Hide
        dhruba borthakur added a comment -

        > The cost of additional framework counters in the JobTracker is pretty high

        are you worried about memory or CPU? Doesn't one job-counter require only one object per job?

        Show
        dhruba borthakur added a comment - > The cost of additional framework counters in the JobTracker is pretty high are you worried about memory or CPU? Doesn't one job-counter require only one object per job?
        Hide
        Chris Douglas added a comment -

        are you worried about memory or CPU? Doesn't one job-counter require only one object per job?

        Memory. The counters are recorded per task

        Show
        Chris Douglas added a comment - are you worried about memory or CPU? Doesn't one job-counter require only one object per job? Memory. The counters are recorded per task
        Hide
        Scott Chen added a comment -

        Hey Chris,

        We can put this in job counter (the one in JobInProgress) instead of task counter.
        So these counters will be recorded per job. The memory cost should be very minor.
        Also the cost for collecting them is very low. We can do this in JobInProgress.meterTaskAttempt() which will be called only once after an attempt is finished.

        These times are recorded in job history but it will be nice to have them on the web UI if there is no much cost.
        What do you think?

        Scott

        Show
        Scott Chen added a comment - Hey Chris, We can put this in job counter (the one in JobInProgress) instead of task counter. So these counters will be recorded per job. The memory cost should be very minor. Also the cost for collecting them is very low. We can do this in JobInProgress.meterTaskAttempt() which will be called only once after an attempt is finished. These times are recorded in job history but it will be nice to have them on the web UI if there is no much cost. What do you think? Scott
        Hide
        Scott Chen added a comment -

        Will it? The shuffle phase can be delayed by the map scheduling, the sort phase is often a noop, and the reduce phase takes whatever time it requires.

        I agree with you. But we will have more information about how much time actually spent on reduce and how much time spent by the framework (map schedule delay and the merge sorting).

        Show
        Scott Chen added a comment - Will it? The shuffle phase can be delayed by the map scheduling, the sort phase is often a noop, and the reduce phase takes whatever time it requires. I agree with you. But we will have more information about how much time actually spent on reduce and how much time spent by the framework (map schedule delay and the merge sorting).
        Hide
        Chris Douglas added a comment -

        These times are recorded in job history but it will be nice to have them on the web UI if there is no much cost.

        OK, I see; I thought the proposal was to add a task counter. Is this intended for the job summary log or for feedback to users? I guess one could consider the shuffle bytes and these times to help tune their job... OK, I think I understand. Thanks for explaining. Should this ignore failed/killed tasks? e.g. speculatively executed reduces killed in the shuffle stage probably shouldn't add to the accumulator.

        Show
        Chris Douglas added a comment - These times are recorded in job history but it will be nice to have them on the web UI if there is no much cost. OK, I see; I thought the proposal was to add a task counter. Is this intended for the job summary log or for feedback to users? I guess one could consider the shuffle bytes and these times to help tune their job... OK, I think I understand. Thanks for explaining. Should this ignore failed/killed tasks? e.g. speculatively executed reduces killed in the shuffle stage probably shouldn't add to the accumulator.
        Hide
        Luke Lu added a comment -

        Please update the JobCounter.properties as well.

        Show
        Luke Lu added a comment - Please update the JobCounter.properties as well.
        Hide
        Scott Chen added a comment -

        e.g. speculatively executed reduces killed in the shuffle stage probably shouldn't add to the accumulator.

        Chris: I see. That makes sense. I will collect this in completedTask() only.

        Luke: Thanks for the reminder. I will be sure to add the change to JobCounter.properties.

        Show
        Scott Chen added a comment - e.g. speculatively executed reduces killed in the shuffle stage probably shouldn't add to the accumulator. Chris: I see. That makes sense. I will collect this in completedTask() only. Luke: Thanks for the reminder. I will be sure to add the change to JobCounter.properties.
        Hide
        Scott Chen added a comment -

        Update. Change JobCounter.properties and unit test.

        @Chris: I still collects the time for the failed and killed task. Because I want to make this consistant with SLOT_MILLIS_REDUCE and SLOT_MILLIS_MAP. Let me know if this is OK with you.

        Show
        Scott Chen added a comment - Update. Change JobCounter.properties and unit test. @Chris: I still collects the time for the failed and killed task. Because I want to make this consistant with SLOT_MILLIS_REDUCE and SLOT_MILLIS_MAP. Let me know if this is OK with you.
        Hide
        Chris Douglas added a comment -

        Either way is OK with me, but I'm not wholly clear on its intended audience. The SLOT_MILLIS_* counters are useful to operators and developers, as they provide information about the efficiency of the scheduler: they're useful for bottleneck analysis of repeated sets of jobs, tuning of the aggregate cluster, and comparing different runs of concurrent pipelines. By only accumulating the time that was actually spent doing work, the proposed counters could measure the efficiency of the job and be useful to the user, for tuning parameters like slowstart (a long shuffle time for small amounts of intermediate data might indicate that the job is scheduling reduces too early).

        Most of the framework counters (FileSystem, framework bytes and records) provide feedback to the user, to help determine if their job is written correctly and tuned efficiently. This is slightly different, because it's not a property of a particular MapReduce job (e.g. a job where every reduce fails once could look "efficient" by this metric). I guess my question would be: if this information is presented in every user job, then how should (s)he react to it? If it's not user-centric and only another presentation of data the operator already has, then it seems less motivated to me. All that said, the cost is low, so if you feel it's useful then I've no objection to it.

        Show
        Chris Douglas added a comment - Either way is OK with me, but I'm not wholly clear on its intended audience. The SLOT_MILLIS_* counters are useful to operators and developers, as they provide information about the efficiency of the scheduler: they're useful for bottleneck analysis of repeated sets of jobs, tuning of the aggregate cluster, and comparing different runs of concurrent pipelines. By only accumulating the time that was actually spent doing work, the proposed counters could measure the efficiency of the job and be useful to the user, for tuning parameters like slowstart (a long shuffle time for small amounts of intermediate data might indicate that the job is scheduling reduces too early). Most of the framework counters (FileSystem, framework bytes and records) provide feedback to the user, to help determine if their job is written correctly and tuned efficiently. This is slightly different, because it's not a property of a particular MapReduce job (e.g. a job where every reduce fails once could look "efficient" by this metric). I guess my question would be: if this information is presented in every user job, then how should (s)he react to it? If it's not user-centric and only another presentation of data the operator already has, then it seems less motivated to me. All that said, the cost is low, so if you feel it's useful then I've no objection to it.
        Hide
        Scott Chen added a comment -

        Hey Chris: Thanks for the reply. I think some users should be interested in seeing how much time the reducer actually spending on the reduce phase (some advanced users maybe). And the cost for doing this is low. I feel it is a good thing to have.

        Show
        Scott Chen added a comment - Hey Chris: Thanks for the reply. I think some users should be interested in seeing how much time the reducer actually spending on the reduce phase (some advanced users maybe). And the cost for doing this is low. I feel it is a good thing to have.
        Hide
        Chris Douglas added a comment -

        Soright. +1 on the patch, assuming Hudson comes back clean

        Show
        Chris Douglas added a comment - Soright. +1 on the patch, assuming Hudson comes back clean
        Hide
        Scott Chen added a comment -

        Looks like hudson got stuck again. I am posting the test-patch result here.

             [exec] BUILD SUCCESSFUL
             [exec] Total time: 2 minutes 3 seconds
             [exec]
             [exec]
             [exec]
             [exec]
             [exec] +1 overall.
             [exec]
             [exec]     +1 @author.  The patch does not contain any @author tags.
             [exec]
             [exec]     +1 tests included.  The patch appears to include 3 new or modified tests.
             [exec]
             [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
             [exec]
             [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
             [exec]
             [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
             [exec]
             [exec]     +1 release audit.  The applied patch does not increase the total number of release audit warnings.
             [exec]
             [exec]     +1 system tests framework.  The patch passed system tests framework compile.
             [exec]
             [exec]
             [exec]
             [exec]
             [exec] ======================================================================
             [exec] ======================================================================
             [exec]     Finished build.
             [exec] ======================================================================
             [exec] ======================================================================
             [exec]
             [exec]
        
        BUILD SUCCESSFULTotal time: 20 minutes 28 seconds
        
        Show
        Scott Chen added a comment - Looks like hudson got stuck again. I am posting the test-patch result here. [exec] BUILD SUCCESSFUL [exec] Total time: 2 minutes 3 seconds [exec] [exec] [exec] [exec] [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [exec] [exec] +1 system tests framework. The patch passed system tests framework compile. [exec] [exec] [exec] [exec] [exec] ====================================================================== [exec] ====================================================================== [exec] Finished build. [exec] ====================================================================== [exec] ====================================================================== [exec] [exec] BUILD SUCCESSFULTotal time: 20 minutes 28 seconds
        Hide
        Scott Chen added a comment -

        I put test results in the attached file.
        The following test failed. But it is not relevant to this change.

        [junit] Running org.apache.hadoop.mapred.TestControlledMapReduceJob
        [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0 sec
        
        Show
        Scott Chen added a comment - I put test results in the attached file. The following test failed. But it is not relevant to this change. [junit] Running org.apache.hadoop.mapred.TestControlledMapReduceJob [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0 sec
        Hide
        Amar Kamat added a comment -

        Scott,
        If I understand correctly, you intend to display/store the amount of time spent by the reducer during shuffle, sort and reduce phase in the task/attempt counter for completed tasks. Currently, once the task is done, shuffle and sort times are stored in the job history (via TaskStatus). Why can't you simply use this, instead of counters, to display shuffle and sort times for reducers on the webui (and store it in TaskReport)?

        Show
        Amar Kamat added a comment - Scott, If I understand correctly, you intend to display/store the amount of time spent by the reducer during shuffle, sort and reduce phase in the task/attempt counter for completed tasks. Currently, once the task is done, shuffle and sort times are stored in the job history (via TaskStatus ). Why can't you simply use this, instead of counters, to display shuffle and sort times for reducers on the webui (and store it in TaskReport )?
        Hide
        Scott Chen added a comment -

        Hey Amar,

        Thanks for the comment.
        The information we put there in this patch is the time aggregated for the Job, not for each Task.
        The per task information can already be viewed in the individual task page.

        It is similar to SLOTS_MILLIS_REDUCES but divided in to different phases.
        The goal is to provides a UI shows how much total time in each phase are spent for the whole job.

        Show
        Scott Chen added a comment - Hey Amar, Thanks for the comment. The information we put there in this patch is the time aggregated for the Job, not for each Task. The per task information can already be viewed in the individual task page. It is similar to SLOTS_MILLIS_REDUCES but divided in to different phases. The goal is to provides a UI shows how much total time in each phase are spent for the whole job.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12458569/test-result.txt
        against trunk revision 1074251.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 68 new or modified tests.

        -1 patch. The patch command could not apply the patch.

        Console output: https://hudson.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/60//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12458569/test-result.txt against trunk revision 1074251. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 68 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://hudson.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/60//console This message is automatically generated.
        Hide
        Tom White added a comment -

        Patch no longer applies.

        Show
        Tom White added a comment - Patch no longer applies.
        Hide
        Scott Chen added a comment -

        Rebase the patch

        Show
        Scott Chen added a comment - Rebase the patch
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12472483/MAPREDUCE-2124.3.txt
        against trunk revision 1076411.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        -1 contrib tests. The patch failed contrib unit tests.

        +1 system test framework. The patch passed system test framework compile.

        Test results: https://hudson.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/113//testReport/
        Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/113//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Console output: https://hudson.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/113//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12472483/MAPREDUCE-2124.3.txt against trunk revision 1076411. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://hudson.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/113//testReport/ Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/113//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://hudson.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/113//console This message is automatically generated.
        Hide
        Arun C Murthy added a comment -

        Sorry to come in late, the patch has gone stale. Can you please rebase? Thanks.

        Show
        Arun C Murthy added a comment - Sorry to come in late, the patch has gone stale. Can you please rebase? Thanks.
        Hide
        Scott Chen added a comment -

        Thanks for the comment. I will rebase this soon.

        Show
        Scott Chen added a comment - Thanks for the comment. I will rebase this soon.

          People

          • Assignee:
            Scott Chen
            Reporter:
            Scott Chen
          • Votes:
            0 Vote for this issue
            Watchers:
            12 Start watching this issue

            Dates

            • Created:
              Updated:

              Development