Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-743

Progress of map phase in map task is not updated properly

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.21.0
    • Fix Version/s: 0.21.0
    • Component/s: task
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Progress of map phase in map task is not updated properly. The progress set by TrackedRecordReader and NewTrackingRecordReader should set the progress object of map phase. It was setting it as the progress of whole task and because of phases, this is not considered as part of map task progress.

      1. MR-743.patch
        0.5 kB
        Ravi Gummadi
      2. MR-743.v1.patch
        8 kB
        Ravi Gummadi
      3. MR-743.v2.patch
        12 kB
        Ravi Gummadi
      4. MR-743.v2.1.patch
        12 kB
        Ravi Gummadi
      5. MR-743.v2.2.patch
        12 kB
        Ravi Gummadi
      6. MR-743.v2.3.patch
        12 kB
        Ravi Gummadi
      7. MR-743.v3.patch
        13 kB
        Ravi Gummadi

        Issue Links

          Activity

          Hide
          Ravi Gummadi added a comment -

          Attaching patch fixing the issue.

          Please review and provide your comments.

          Show
          Ravi Gummadi added a comment - Attaching patch fixing the issue. Please review and provide your comments.
          Hide
          Ravi Gummadi added a comment -

          Made some minor changes to the fix.
          Added a testcase to verify map progress.

          Please review and provide your comments.

          Show
          Ravi Gummadi added a comment - Made some minor changes to the fix. Added a testcase to verify map progress. Please review and provide your comments.
          Hide
          Ravi Gummadi added a comment -

          When compressed files are given as input to maps, the progress is not updated because the size of the input file(uncompressed size) is considered as Long.MAX_VALUE and thus the progress of map task with compressed file as input is ignored because of very small value 1/Long.MAX_VALUE. Progress values seen are of the order of 10^-17 to 10^-11.

          I saw on the web http://www.abeel.be/content/determine-uncompressed-size-gzip-file that says that the last 4 bytes of gzipped file contain the uncompressed file size. But this works only if the size is < 4GB.

          Any thoughts on getting the uncompressed file size of compressed files(at leaset for gzipped files) ?

          Show
          Ravi Gummadi added a comment - When compressed files are given as input to maps, the progress is not updated because the size of the input file(uncompressed size) is considered as Long.MAX_VALUE and thus the progress of map task with compressed file as input is ignored because of very small value 1/Long.MAX_VALUE. Progress values seen are of the order of 10^-17 to 10^-11. I saw on the web http://www.abeel.be/content/determine-uncompressed-size-gzip-file that says that the last 4 bytes of gzipped file contain the uncompressed file size. But this works only if the size is < 4GB. Any thoughts on getting the uncompressed file size of compressed files(at leaset for gzipped files) ?
          Hide
          Ravi Gummadi added a comment -

          Attaching new patch. Now testcase doesn't start a job but calls MapTask.run() directly(similar to LocalJobRunner) and uses custom TaskReporter that validates map phase progress.

          This patch depends on patch of HADOOP-6163.

          Please review and provide your comments.

          Show
          Ravi Gummadi added a comment - Attaching new patch. Now testcase doesn't start a job but calls MapTask.run() directly(similar to LocalJobRunner) and uses custom TaskReporter that validates map phase progress. This patch depends on patch of HADOOP-6163 . Please review and provide your comments.
          Hide
          Ravi Gummadi added a comment -

          1 unit test failed with previous patch because of an issue in LocalJobRunner.
          Attaching new patch fixing the issue.

          All unit tests passed on my local machine.

          Please review and provide your comments.

          Show
          Ravi Gummadi added a comment - 1 unit test failed with previous patch because of an issue in LocalJobRunner. Attaching new patch fixing the issue. All unit tests passed on my local machine. Please review and provide your comments.
          Hide
          Ravi Gummadi added a comment -

          Attaching new patch cleaning up testcase code so that it directly calls mapTask.run() method and TestMapTask doesn't override run() method now but overrides new method startReporter().

          Please review and provide your comments.

          Show
          Ravi Gummadi added a comment - Attaching new patch cleaning up testcase code so that it directly calls mapTask.run() method and TestMapTask doesn't override run() method now but overrides new method startReporter(). Please review and provide your comments.
          Hide
          Ravi Gummadi added a comment -

          Missed adding apache license header in the new testcase file.
          Attaching new patch.

          Show
          Ravi Gummadi added a comment - Missed adding apache license header in the new testcase file. Attaching new patch.
          Hide
          Ravi Gummadi added a comment -

          when I included hadoop-core-0.21.0-dev.jar using the patch of HADOOP-6163,

          Unit tests passed on my local machine.

          ant test-patch gave

          [exec] +1 overall.
          [exec]
          [exec] +1 @author. The patch does not contain any @author tags.
          [exec]
          [exec] +1 tests included. The patch appears to include 3 new or modified tests.
          [exec]
          [exec] +1 javadoc. The javadoc tool did not generate any warning messages.
          [exec]
          [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
          [exec]
          [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
          [exec]
          [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.

          Show
          Ravi Gummadi added a comment - when I included hadoop-core-0.21.0-dev.jar using the patch of HADOOP-6163 , Unit tests passed on my local machine. ant test-patch gave [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.
          Hide
          Ravi Gummadi added a comment -

          Attaching patch that doesn't check if phases exist in TaskReporter.setProgress() as we don't have tasks that need to set progress and that don't have phases(both map tasks and reduce tasks have phases).
          map phase in map tasks and reduce phase in reduce tasks use this TaskReporter.setProgress().

          Please review and provide your comments.

          Show
          Ravi Gummadi added a comment - Attaching patch that doesn't check if phases exist in TaskReporter.setProgress() as we don't have tasks that need to set progress and that don't have phases(both map tasks and reduce tasks have phases). map phase in map tasks and reduce phase in reduce tasks use this TaskReporter.setProgress(). Please review and provide your comments.
          Hide
          Ravi Gummadi added a comment -

          Unit tests passed on my local machine.

          ant test-patch gave

          [exec] +1 overall.
          [exec]
          [exec] +1 @author. The patch does not contain any @author tags.
          [exec]
          [exec] +1 tests included. The patch appears to include 3 new or modified tests.
          [exec]
          [exec] +1 javadoc. The javadoc tool did not generate any warning messages.
          [exec]
          [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
          [exec]
          [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
          [exec]
          [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.

          Show
          Ravi Gummadi added a comment - Unit tests passed on my local machine. ant test-patch gave [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.
          Hide
          Devaraj Das added a comment -

          I just committed this. Thanks, Ravi!

          Show
          Devaraj Das added a comment - I just committed this. Thanks, Ravi!
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Since test-patch is broken (see HADOOP-6124), the patch committed introduced a javac warning without being detected.

              [javac] d:\@sze\hadoop\mapreduce\m1\src\java\org\apache\hadoop\mapred\LocalJobRunner.java:74: warning: [unchecked]
           unchecked call to serialize(T) as a member of the raw type org.apache.hadoop.io.serializer.Serializer
              [javac]       serializer.serialize(splits.get(i));
              [javac]                           ^
          
          Show
          Tsz Wo Nicholas Sze added a comment - Since test-patch is broken (see HADOOP-6124 ), the patch committed introduced a javac warning without being detected. [javac] d:\@sze\hadoop\mapreduce\m1\src\java\org\apache\hadoop\mapred\LocalJobRunner.java:74: warning: [unchecked] unchecked call to serialize(T) as a member of the raw type org.apache.hadoop.io.serializer.Serializer [javac] serializer.serialize(splits.get(i)); [javac] ^

            People

            • Assignee:
              Ravi Gummadi
              Reporter:
              Ravi Gummadi
            • Votes:
              1 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development