Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-1073

Progress reported for pipes tasks is incorrect.

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 0.20.1
    • Fix Version/s: None
    • Component/s: pipes
    • Labels:
      None

      Description

      Currently in pipes, org.apache.hadoop.mapred.pipes.PipesMapRunner.run(RecordReader<K1, V1>, OutputCollector<K2, V2>, Reporter) we do the following:

              while (input.next(key, value)) {
                downlink.mapItem(key, value);
                if(skipping) {
                  downlink.flush();
                }
              }
      

      This would result in consumption of all the records for current task and taking task progress to 100% whereas the actual pipes application would be trailing behind.

      1. MAPREDUCE-1073_yhadoop20.patch
        1 kB
        Arun C Murthy
      2. mapreduce-1073--2010-03-31.patch
        2 kB
        Dick King
      3. mapreduce-1073--2010-04-06.patch
        282 kB
        Dick King
      4. MAPREDUCE-1073--yhadoop20--2010-07-22.patch
        27 kB
        Dick King
      5. MAPREDUCE-1073--yhadoop20--2010-07-22--1530.patch
        26 kB
        Dick King

        Activity

        Hide
        Robert Joseph Evans added a comment -

        Canceling the patch as the latest patch is over a year old, and will not apply to trunk any more. Dick, if you are still interested in getting this patch in, please up merge and repost it. I would be happy to review it, give you feed back on your approach, and commit it for you. If you have abandoned the JIRA please post a comment in the JIRA so I can close it for you.

        Show
        Robert Joseph Evans added a comment - Canceling the patch as the latest patch is over a year old, and will not apply to trunk any more. Dick, if you are still interested in getting this patch in, please up merge and repost it. I would be happy to review it, give you feed back on your approach, and commit it for you. If you have abandoned the JIRA please post a comment in the JIRA so I can close it for you.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12450229/MAPREDUCE-1073--yhadoop20--2010-07-22--1530.patch
        against trunk revision 1075216.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 7 new or modified tests.

        -1 patch. The patch command could not apply the patch.

        Console output: https://hudson.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/95//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12450229/MAPREDUCE-1073--yhadoop20--2010-07-22--1530.patch against trunk revision 1075216. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 7 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://hudson.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/95//console This message is automatically generated.
        Show
        Dick King added a comment - I would like to invite community comment on the approach of https://issues.apache.org/jira/secure/attachment/12450229/MAPREDUCE-1073--yhadoop20--2010-07-22--1530.patch which is described in https://issues.apache.org/jira/browse/MAPREDUCE-1073?focusedCommentId=12891371&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12891371 before I do any forward port.
        Hide
        Dick King added a comment -

        I revised the patch to not add an API to read and set set the property that tells MapTask.TrackedRecordReader to not record
        progress as it reads the input; just read and set the property "by hand" in the code. Since this is a
        pipes-specific feature, it should be handled only by a focused attribute, which I then renamed to
        mapred.pipes.disable.record.reader.progress .

        In https://issues.apache.org/jira/browse/MAPREDUCE-1073?focusedCommentId=12891327&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12891327 , the API for marking a job as having mappers who will indicate their own progress is now to just set the property, which I have renamed from mapred.job.disable.record.reader.progress to mapred.pipes.disable.record.reader.progress , because this is a pipes-only concept.

        Show
        Dick King added a comment - I revised the patch to not add an API to read and set set the property that tells MapTask.TrackedRecordReader to not record progress as it reads the input; just read and set the property "by hand" in the code. Since this is a pipes-specific feature, it should be handled only by a focused attribute, which I then renamed to mapred.pipes.disable.record.reader.progress . In https://issues.apache.org/jira/browse/MAPREDUCE-1073?focusedCommentId=12891327&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12891327 , the API for marking a job as having mappers who will indicate their own progress is now to just set the property, which I have renamed from mapred.job.disable.record.reader.progress to mapred.pipes.disable.record.reader.progress , because this is a pipes-only concept.
        Hide
        Dick King added a comment -

        In my previous comment I should have said that this patch addresses BOTH points, and is complete modulo a forward port.

        Show
        Dick King added a comment - In my previous comment I should have said that this patch addresses BOTH points, and is complete modulo a forward port.
        Hide
        Dick King added a comment -

        The previous versions of this attachment missed one point.

        The basic problem is that with the existing code base the progress is based on the records read from the input split, but there is buffering in the way pipes works. This makes the tasks appear to have made more progress than they deserve to have made, in jobs where the input splits are small.

        To make speculation work under pipes with small input splits, two conditions have to be met:

        1: The pipes code has to have an API to report progress, and has to use it. The old patch met this goal. You incant (&context)->serProgress(float) within HadoopPipes::Mapper.map(HadoopPipes::MapContext& context) . This does require that you have a way of measuring progress,which I consider likely because this is only needed when the input splits are small, which implies that the "input data" is really a signal to get the real data somewhere else [or to generate it].

        2: The job has to be able to say that the progress that would otherwise be inferred from input split reads has to be ignored. This newest version of the patch does that; you can either call JobConf.setRecordReaderProgressDisabled(true), or set the attribute mapred.job.disable.record.reader.progress to true .

        This patch addresses the second point. I did not mark it available because it needs a forward port. I attached it to this issue for comments, and for the record.

        Show
        Dick King added a comment - The previous versions of this attachment missed one point. The basic problem is that with the existing code base the progress is based on the records read from the input split, but there is buffering in the way pipes works. This makes the tasks appear to have made more progress than they deserve to have made, in jobs where the input splits are small. To make speculation work under pipes with small input splits, two conditions have to be met: 1: The pipes code has to have an API to report progress, and has to use it. The old patch met this goal. You incant (&context)->serProgress(float) within HadoopPipes::Mapper.map(HadoopPipes::MapContext& context) . This does require that you have a way of measuring progress,which I consider likely because this is only needed when the input splits are small, which implies that the "input data" is really a signal to get the real data somewhere else [or to generate it] . 2: The job has to be able to say that the progress that would otherwise be inferred from input split reads has to be ignored. This newest version of the patch does that; you can either call JobConf.setRecordReaderProgressDisabled(true) , or set the attribute mapred.job.disable.record.reader.progress to true . This patch addresses the second point. I did not mark it available because it needs a forward port. I attached it to this issue for comments, and for the record.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12440953/mapreduce-1073--2010-04-06.patch
        against trunk revision 955068.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 8 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        -1 contrib tests. The patch failed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/572/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/572/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/572/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/572/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12440953/mapreduce-1073--2010-04-06.patch against trunk revision 955068. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 8 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/572/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/572/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/572/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/572/console This message is automatically generated.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12440953/mapreduce-1073--2010-04-06.patch
        against trunk revision 946955.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 8 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/199/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/199/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/199/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/199/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12440953/mapreduce-1073--2010-04-06.patch against trunk revision 946955. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 8 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/199/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/199/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/199/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/199/console This message is automatically generated.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12440953/mapreduce-1073--2010-04-06.patch
        against trunk revision 931274.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 8 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/96/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/96/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/96/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/96/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12440953/mapreduce-1073--2010-04-06.patch against trunk revision 931274. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 8 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/96/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/96/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/96/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/96/console This message is automatically generated.
        Hide
        Dick King added a comment -

        This patch is as large as it is because it includes the removal of src/examples/pipes/aclocal.m4 . That file is a derived file that should not be included in the code base.

        Show
        Dick King added a comment - This patch is as large as it is because it includes the removal of src/examples/pipes/aclocal.m4 . That file is a derived file that should not be included in the code base.
        Hide
        Dick King added a comment -

        Removed this patch to replace it with another patch that tests its functionality.

        Show
        Dick King added a comment - Removed this patch to replace it with another patch that tests its functionality.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12440406/mapreduce-1073--2010-03-31.patch
        against trunk revision 929712.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/83/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/83/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/83/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/83/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12440406/mapreduce-1073--2010-03-31.patch against trunk revision 929712. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/83/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/83/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/83/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/83/console This message is automatically generated.
        Hide
        Dick King added a comment -

        I've checked that the patch marks progress continuously, if your code uses it and it has a way to figure the progress [which is not always available].

        Show
        Dick King added a comment - I've checked that the patch marks progress continuously, if your code uses it and it has a way to figure the progress [which is not always available] .
        Hide
        Arun C Murthy added a comment -

        Forgot to thank Christian for the patch!

        Show
        Arun C Murthy added a comment - Forgot to thank Christian for the patch!
        Hide
        Arun C Murthy added a comment -

        Adding a 'setProgress' api for pipes applications.

        Show
        Arun C Murthy added a comment - Adding a 'setProgress' api for pipes applications.
        Hide
        Sreekanth Ramakrishnan added a comment -

        The implication of the incorrect progress affect scheduling of speculative tasks for the pipes jobs. As progress reported for all the pipes task would be 100%

        Show
        Sreekanth Ramakrishnan added a comment - The implication of the incorrect progress affect scheduling of speculative tasks for the pipes jobs. As progress reported for all the pipes task would be 100%

          People

          • Assignee:
            Dick King
            Reporter:
            Sreekanth Ramakrishnan
          • Votes:
            1 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

            • Created:
              Updated:

              Development