Hadoop Common
  1. Hadoop Common
  2. HADOOP-5710

Counter MAP_INPUT_BYTES missing from new mapreduce api.

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.21.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      MapTask, with current new mapreduce api, does not maintain MAP_INPUT_BYTES counter. Since RecordReader doesnot have getPos() api, it is not possible for the map task to maintain. Individual record readers (LineRecordReader, SequenceFileRecordReader) could be modified to maintain the same.

      1. patch-5710-3.txt
        13 kB
        Amareshwari Sriramadasu
      2. patch-5710-2.txt
        13 kB
        Amareshwari Sriramadasu
      3. patch-5710-1.txt
        11 kB
        Amareshwari Sriramadasu
      4. patch-5710.txt
        4 kB
        Amareshwari Sriramadasu

        Issue Links

          Activity

          Amareshwari Sriramadasu created issue -
          Amareshwari Sriramadasu made changes -
          Field Original Value New Value
          Link This issue blocks HADOOP-5696 [ HADOOP-5696 ]
          Amareshwari Sriramadasu made changes -
          Link This issue is blocked by HADOOP-5717 [ HADOOP-5717 ]
          Hide
          Amareshwari Sriramadasu added a comment -

          Patch adding MAP_INPUT_BYTES counter to LineRecordReader and SequenceRecordReader.

          Show
          Amareshwari Sriramadasu added a comment - Patch adding MAP_INPUT_BYTES counter to LineRecordReader and SequenceRecordReader.
          Amareshwari Sriramadasu made changes -
          Attachment patch-5710.txt [ 12406908 ]
          Hide
          Amareshwari Sriramadasu added a comment -

          Please apply the patch after commiting HADOOP-5717, because compilation would break.

          Show
          Amareshwari Sriramadasu added a comment - Please apply the patch after commiting HADOOP-5717 , because compilation would break.
          Amareshwari Sriramadasu made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Hide
          Sharad Agarwal added a comment -

          I propose that we remove MAP_INPUT_BYTES from the framework counters as it is specific to file based input formats only. With new api, the framework no more calculates this. It is getting calculated in lib. So we can move MAP_INPUT_BYTES to FileInputFormat class. That seems to be its more logical place.

          Show
          Sharad Agarwal added a comment - I propose that we remove MAP_INPUT_BYTES from the framework counters as it is specific to file based input formats only. With new api, the framework no more calculates this. It is getting calculated in lib. So we can move MAP_INPUT_BYTES to FileInputFormat class. That seems to be its more logical place.
          Hide
          Owen O'Malley added a comment -

          That sounds reasonable, Sharad.

          Show
          Owen O'Malley added a comment - That sounds reasonable, Sharad.
          Amareshwari Sriramadasu made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Hide
          Amareshwari Sriramadasu added a comment -

          Attaching patch which moves MAP_INPUT_BYTES to org.apache.hadoop.mapreduce.lib.input.FileInputFormat

          Show
          Amareshwari Sriramadasu added a comment - Attaching patch which moves MAP_INPUT_BYTES to org.apache.hadoop.mapreduce.lib.input.FileInputFormat
          Amareshwari Sriramadasu made changes -
          Attachment patch-5710-1.txt [ 12408553 ]
          Hide
          Sharad Agarwal added a comment -

          Patch looks good. Few minor points:
          should we rename the group and counter to more readable and intuitive value. Say Group name as "FileInputFormatCounters" and counter name as "BYTES_READ", Because anyway we are calculating the bytes read from the file, not really the bytes input to the map (there may be some bytes skipped while reading so these two may not be same).
          include a deprecated warning in the old Counters API while mapping the old name to new name.

          Show
          Sharad Agarwal added a comment - Patch looks good. Few minor points: should we rename the group and counter to more readable and intuitive value. Say Group name as "FileInputFormatCounters" and counter name as "BYTES_READ", Because anyway we are calculating the bytes read from the file, not really the bytes input to the map (there may be some bytes skipped while reading so these two may not be same). include a deprecated warning in the old Counters API while mapping the old name to new name.
          Hide
          Amareshwari Sriramadasu added a comment -

          Patch incorporating review comments. Earlier patch had findbugs warning saying Unchecked/unconfirmed cast fromTaskAttemptContext to MapContext . Added the warning to exclude list, since the warning is unavoidable.

          Show
          Amareshwari Sriramadasu added a comment - Patch incorporating review comments. Earlier patch had findbugs warning saying Unchecked/unconfirmed cast fromTaskAttemptContext to MapContext . Added the warning to exclude list, since the warning is unavoidable.
          Amareshwari Sriramadasu made changes -
          Attachment patch-5710-2.txt [ 12408678 ]
          Hide
          Amareshwari Sriramadasu added a comment -

          test-patch result :

               [exec]
               [exec] +1 overall.
               [exec]
               [exec]     +1 @author.  The patch does not contain any @author tags.
               [exec]
               [exec]     +1 tests included.  The patch appears to include 12 new or modified tests.
               [exec]
               [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
               [exec]
               [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
               [exec]
               [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
               [exec]
               [exec]     +1 Eclipse classpath. The patch retains Eclipse classpath integrity.
               [exec]
               [exec]     +1 release audit.  The applied patch does not increase the total number of release audit warnings.
               [exec]
          

          ant test passed on my machine.

          Show
          Amareshwari Sriramadasu added a comment - test-patch result : [exec] [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 12 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [exec] ant test passed on my machine.
          Amareshwari Sriramadasu made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12408678/patch-5710-2.txt
          against trunk revision 777761.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 12 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          -1 contrib tests. The patch failed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/387/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/387/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/387/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/387/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12408678/patch-5710-2.txt against trunk revision 777761. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 12 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/387/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/387/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/387/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/387/console This message is automatically generated.
          Hide
          Amareshwari Sriramadasu added a comment -

          Contrib test failures are not related to patch. They fail on trunk also.

          Show
          Amareshwari Sriramadasu added a comment - Contrib test failures are not related to patch. They fail on trunk also.
          Hide
          Sharad Agarwal added a comment -

          one minor bit: In LineRecordReader and SequenceFileRecordReader, counter lookup is happening for each key. This can be avoided if we keep counter instance as the member variable as opposed to keeping MapContext.

          Show
          Sharad Agarwal added a comment - one minor bit: In LineRecordReader and SequenceFileRecordReader, counter lookup is happening for each key. This can be avoided if we keep counter instance as the member variable as opposed to keeping MapContext.
          Hide
          Amareshwari Sriramadasu added a comment -

          Patch incorporating review comments.

          test-patch result:

               [exec] +1 overall.
               [exec]
               [exec]     +1 @author.  The patch does not contain any @author tags.
               [exec]
               [exec]     +1 tests included.  The patch appears to include 12 new or modified tests.
               [exec]
               [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
               [exec]
               [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
               [exec]
               [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
               [exec]
               [exec]     +1 Eclipse classpath. The patch retains Eclipse classpath integrity.
               [exec]
               [exec]     +1 release audit.  The applied patch does not increase the total number of release audit warnings.
               [exec]
          

          ant test passed on my machine.

          Show
          Amareshwari Sriramadasu added a comment - Patch incorporating review comments. test-patch result: [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 12 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [exec] ant test passed on my machine.
          Amareshwari Sriramadasu made changes -
          Attachment patch-5710-3.txt [ 12408938 ]
          Hide
          Sharad Agarwal added a comment -

          I just committed this. Thanks Amareshwari!

          Show
          Sharad Agarwal added a comment - I just committed this. Thanks Amareshwari!
          Sharad Agarwal made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Hadoop Flags [Reviewed]
          Resolution Fixed [ 1 ]
          Owen O'Malley made changes -
          Component/s mapred [ 12310690 ]
          Tom White made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Transition Time In Source Status Execution Times Last Executer Last Execution Date
          Patch Available Patch Available Open Open
          19d 17h 4m 1 Amareshwari Sriramadasu 20/May/09 05:44
          Open Open Patch Available Patch Available
          10d 13h 19m 2 Amareshwari Sriramadasu 21/May/09 11:58
          Patch Available Patch Available Resolved Resolved
          4d 25m 1 Sharad Agarwal 25/May/09 12:23
          Resolved Resolved Closed Closed
          456d 9h 13m 1 Tom White 24/Aug/10 21:37

            People

            • Assignee:
              Amareshwari Sriramadasu
              Reporter:
              Amareshwari Sriramadasu
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development