Hadoop Common
  1. Hadoop Common
  2. HADOOP-5710

Counter MAP_INPUT_BYTES missing from new mapreduce api.

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.21.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      MapTask, with current new mapreduce api, does not maintain MAP_INPUT_BYTES counter. Since RecordReader doesnot have getPos() api, it is not possible for the map task to maintain. Individual record readers (LineRecordReader, SequenceFileRecordReader) could be modified to maintain the same.

      1. patch-5710-3.txt
        13 kB
        Amareshwari Sriramadasu
      2. patch-5710-2.txt
        13 kB
        Amareshwari Sriramadasu
      3. patch-5710-1.txt
        11 kB
        Amareshwari Sriramadasu
      4. patch-5710.txt
        4 kB
        Amareshwari Sriramadasu

        Issue Links

          Activity

          Hide
          Sharad Agarwal added a comment -

          I just committed this. Thanks Amareshwari!

          Show
          Sharad Agarwal added a comment - I just committed this. Thanks Amareshwari!
          Hide
          Amareshwari Sriramadasu added a comment -

          Patch incorporating review comments.

          test-patch result:

               [exec] +1 overall.
               [exec]
               [exec]     +1 @author.  The patch does not contain any @author tags.
               [exec]
               [exec]     +1 tests included.  The patch appears to include 12 new or modified tests.
               [exec]
               [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
               [exec]
               [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
               [exec]
               [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
               [exec]
               [exec]     +1 Eclipse classpath. The patch retains Eclipse classpath integrity.
               [exec]
               [exec]     +1 release audit.  The applied patch does not increase the total number of release audit warnings.
               [exec]
          

          ant test passed on my machine.

          Show
          Amareshwari Sriramadasu added a comment - Patch incorporating review comments. test-patch result: [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 12 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [exec] ant test passed on my machine.
          Hide
          Sharad Agarwal added a comment -

          one minor bit: In LineRecordReader and SequenceFileRecordReader, counter lookup is happening for each key. This can be avoided if we keep counter instance as the member variable as opposed to keeping MapContext.

          Show
          Sharad Agarwal added a comment - one minor bit: In LineRecordReader and SequenceFileRecordReader, counter lookup is happening for each key. This can be avoided if we keep counter instance as the member variable as opposed to keeping MapContext.
          Hide
          Amareshwari Sriramadasu added a comment -

          Contrib test failures are not related to patch. They fail on trunk also.

          Show
          Amareshwari Sriramadasu added a comment - Contrib test failures are not related to patch. They fail on trunk also.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12408678/patch-5710-2.txt
          against trunk revision 777761.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 12 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          -1 contrib tests. The patch failed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/387/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/387/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/387/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/387/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12408678/patch-5710-2.txt against trunk revision 777761. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 12 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/387/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/387/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/387/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/387/console This message is automatically generated.
          Hide
          Amareshwari Sriramadasu added a comment -

          test-patch result :

               [exec]
               [exec] +1 overall.
               [exec]
               [exec]     +1 @author.  The patch does not contain any @author tags.
               [exec]
               [exec]     +1 tests included.  The patch appears to include 12 new or modified tests.
               [exec]
               [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
               [exec]
               [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
               [exec]
               [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
               [exec]
               [exec]     +1 Eclipse classpath. The patch retains Eclipse classpath integrity.
               [exec]
               [exec]     +1 release audit.  The applied patch does not increase the total number of release audit warnings.
               [exec]
          

          ant test passed on my machine.

          Show
          Amareshwari Sriramadasu added a comment - test-patch result : [exec] [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 12 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [exec] ant test passed on my machine.
          Hide
          Amareshwari Sriramadasu added a comment -

          Patch incorporating review comments. Earlier patch had findbugs warning saying Unchecked/unconfirmed cast fromTaskAttemptContext to MapContext . Added the warning to exclude list, since the warning is unavoidable.

          Show
          Amareshwari Sriramadasu added a comment - Patch incorporating review comments. Earlier patch had findbugs warning saying Unchecked/unconfirmed cast fromTaskAttemptContext to MapContext . Added the warning to exclude list, since the warning is unavoidable.
          Hide
          Sharad Agarwal added a comment -

          Patch looks good. Few minor points:
          should we rename the group and counter to more readable and intuitive value. Say Group name as "FileInputFormatCounters" and counter name as "BYTES_READ", Because anyway we are calculating the bytes read from the file, not really the bytes input to the map (there may be some bytes skipped while reading so these two may not be same).
          include a deprecated warning in the old Counters API while mapping the old name to new name.

          Show
          Sharad Agarwal added a comment - Patch looks good. Few minor points: should we rename the group and counter to more readable and intuitive value. Say Group name as "FileInputFormatCounters" and counter name as "BYTES_READ", Because anyway we are calculating the bytes read from the file, not really the bytes input to the map (there may be some bytes skipped while reading so these two may not be same). include a deprecated warning in the old Counters API while mapping the old name to new name.
          Hide
          Amareshwari Sriramadasu added a comment -

          Attaching patch which moves MAP_INPUT_BYTES to org.apache.hadoop.mapreduce.lib.input.FileInputFormat

          Show
          Amareshwari Sriramadasu added a comment - Attaching patch which moves MAP_INPUT_BYTES to org.apache.hadoop.mapreduce.lib.input.FileInputFormat
          Hide
          Owen O'Malley added a comment -

          That sounds reasonable, Sharad.

          Show
          Owen O'Malley added a comment - That sounds reasonable, Sharad.
          Hide
          Sharad Agarwal added a comment -

          I propose that we remove MAP_INPUT_BYTES from the framework counters as it is specific to file based input formats only. With new api, the framework no more calculates this. It is getting calculated in lib. So we can move MAP_INPUT_BYTES to FileInputFormat class. That seems to be its more logical place.

          Show
          Sharad Agarwal added a comment - I propose that we remove MAP_INPUT_BYTES from the framework counters as it is specific to file based input formats only. With new api, the framework no more calculates this. It is getting calculated in lib. So we can move MAP_INPUT_BYTES to FileInputFormat class. That seems to be its more logical place.
          Hide
          Amareshwari Sriramadasu added a comment -

          Please apply the patch after commiting HADOOP-5717, because compilation would break.

          Show
          Amareshwari Sriramadasu added a comment - Please apply the patch after commiting HADOOP-5717 , because compilation would break.
          Hide
          Amareshwari Sriramadasu added a comment -

          Patch adding MAP_INPUT_BYTES counter to LineRecordReader and SequenceRecordReader.

          Show
          Amareshwari Sriramadasu added a comment - Patch adding MAP_INPUT_BYTES counter to LineRecordReader and SequenceRecordReader.

            People

            • Assignee:
              Amareshwari Sriramadasu
              Reporter:
              Amareshwari Sriramadasu
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development