Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-743

Progress of map phase in map task is not updated properly

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.21.0
    • Fix Version/s: 0.21.0
    • Component/s: task
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Progress of map phase in map task is not updated properly. The progress set by TrackedRecordReader and NewTrackingRecordReader should set the progress object of map phase. It was setting it as the progress of whole task and because of phases, this is not considered as part of map task progress.

      1. MR-743.patch
        0.5 kB
        Ravi Gummadi
      2. MR-743.v1.patch
        8 kB
        Ravi Gummadi
      3. MR-743.v2.1.patch
        12 kB
        Ravi Gummadi
      4. MR-743.v2.2.patch
        12 kB
        Ravi Gummadi
      5. MR-743.v2.3.patch
        12 kB
        Ravi Gummadi
      6. MR-743.v2.patch
        12 kB
        Ravi Gummadi
      7. MR-743.v3.patch
        13 kB
        Ravi Gummadi

        Issue Links

          Activity

          Ravi Gummadi created issue -
          Hide
          Ravi Gummadi added a comment -

          Attaching patch fixing the issue.

          Please review and provide your comments.

          Show
          Ravi Gummadi added a comment - Attaching patch fixing the issue. Please review and provide your comments.
          Ravi Gummadi made changes -
          Field Original Value New Value
          Attachment MR-743.patch [ 12412950 ]
          Hide
          Ravi Gummadi added a comment -

          Made some minor changes to the fix.
          Added a testcase to verify map progress.

          Please review and provide your comments.

          Show
          Ravi Gummadi added a comment - Made some minor changes to the fix. Added a testcase to verify map progress. Please review and provide your comments.
          Ravi Gummadi made changes -
          Attachment MR-743.v1.patch [ 12413270 ]
          Hide
          Ravi Gummadi added a comment -

          When compressed files are given as input to maps, the progress is not updated because the size of the input file(uncompressed size) is considered as Long.MAX_VALUE and thus the progress of map task with compressed file as input is ignored because of very small value 1/Long.MAX_VALUE. Progress values seen are of the order of 10^-17 to 10^-11.

          I saw on the web http://www.abeel.be/content/determine-uncompressed-size-gzip-file that says that the last 4 bytes of gzipped file contain the uncompressed file size. But this works only if the size is < 4GB.

          Any thoughts on getting the uncompressed file size of compressed files(at leaset for gzipped files) ?

          Show
          Ravi Gummadi added a comment - When compressed files are given as input to maps, the progress is not updated because the size of the input file(uncompressed size) is considered as Long.MAX_VALUE and thus the progress of map task with compressed file as input is ignored because of very small value 1/Long.MAX_VALUE. Progress values seen are of the order of 10^-17 to 10^-11. I saw on the web http://www.abeel.be/content/determine-uncompressed-size-gzip-file that says that the last 4 bytes of gzipped file contain the uncompressed file size. But this works only if the size is < 4GB. Any thoughts on getting the uncompressed file size of compressed files(at leaset for gzipped files) ?
          Ravi Gummadi made changes -
          Link This issue is blocked by HADOOP-6163 [ HADOOP-6163 ]
          Hide
          Ravi Gummadi added a comment -

          Attaching new patch. Now testcase doesn't start a job but calls MapTask.run() directly(similar to LocalJobRunner) and uses custom TaskReporter that validates map phase progress.

          This patch depends on patch of HADOOP-6163.

          Please review and provide your comments.

          Show
          Ravi Gummadi added a comment - Attaching new patch. Now testcase doesn't start a job but calls MapTask.run() directly(similar to LocalJobRunner) and uses custom TaskReporter that validates map phase progress. This patch depends on patch of HADOOP-6163 . Please review and provide your comments.
          Ravi Gummadi made changes -
          Attachment MR-743.v2.patch [ 12413986 ]
          Hide
          Ravi Gummadi added a comment -

          1 unit test failed with previous patch because of an issue in LocalJobRunner.
          Attaching new patch fixing the issue.

          All unit tests passed on my local machine.

          Please review and provide your comments.

          Show
          Ravi Gummadi added a comment - 1 unit test failed with previous patch because of an issue in LocalJobRunner. Attaching new patch fixing the issue. All unit tests passed on my local machine. Please review and provide your comments.
          Ravi Gummadi made changes -
          Attachment MR-743.v2.1.patch [ 12414068 ]
          Hide
          Ravi Gummadi added a comment -

          Attaching new patch cleaning up testcase code so that it directly calls mapTask.run() method and TestMapTask doesn't override run() method now but overrides new method startReporter().

          Please review and provide your comments.

          Show
          Ravi Gummadi added a comment - Attaching new patch cleaning up testcase code so that it directly calls mapTask.run() method and TestMapTask doesn't override run() method now but overrides new method startReporter(). Please review and provide your comments.
          Ravi Gummadi made changes -
          Attachment MR-743.v2.2.patch [ 12414088 ]
          Ravi Gummadi made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Ravi Gummadi made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Hide
          Ravi Gummadi added a comment -

          Missed adding apache license header in the new testcase file.
          Attaching new patch.

          Show
          Ravi Gummadi added a comment - Missed adding apache license header in the new testcase file. Attaching new patch.
          Ravi Gummadi made changes -
          Attachment MR-743.v2.3.patch [ 12414101 ]
          Ravi Gummadi made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Hide
          Ravi Gummadi added a comment -

          when I included hadoop-core-0.21.0-dev.jar using the patch of HADOOP-6163,

          Unit tests passed on my local machine.

          ant test-patch gave

          [exec] +1 overall.
          [exec]
          [exec] +1 @author. The patch does not contain any @author tags.
          [exec]
          [exec] +1 tests included. The patch appears to include 3 new or modified tests.
          [exec]
          [exec] +1 javadoc. The javadoc tool did not generate any warning messages.
          [exec]
          [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
          [exec]
          [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
          [exec]
          [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.

          Show
          Ravi Gummadi added a comment - when I included hadoop-core-0.21.0-dev.jar using the patch of HADOOP-6163 , Unit tests passed on my local machine. ant test-patch gave [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.
          Ravi Gummadi made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Hide
          Ravi Gummadi added a comment -

          Attaching patch that doesn't check if phases exist in TaskReporter.setProgress() as we don't have tasks that need to set progress and that don't have phases(both map tasks and reduce tasks have phases).
          map phase in map tasks and reduce phase in reduce tasks use this TaskReporter.setProgress().

          Please review and provide your comments.

          Show
          Ravi Gummadi added a comment - Attaching patch that doesn't check if phases exist in TaskReporter.setProgress() as we don't have tasks that need to set progress and that don't have phases(both map tasks and reduce tasks have phases). map phase in map tasks and reduce phase in reduce tasks use this TaskReporter.setProgress(). Please review and provide your comments.
          Ravi Gummadi made changes -
          Attachment MR-743.v3.patch [ 12414212 ]
          Hide
          Ravi Gummadi added a comment -

          Unit tests passed on my local machine.

          ant test-patch gave

          [exec] +1 overall.
          [exec]
          [exec] +1 @author. The patch does not contain any @author tags.
          [exec]
          [exec] +1 tests included. The patch appears to include 3 new or modified tests.
          [exec]
          [exec] +1 javadoc. The javadoc tool did not generate any warning messages.
          [exec]
          [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
          [exec]
          [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
          [exec]
          [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.

          Show
          Ravi Gummadi added a comment - Unit tests passed on my local machine. ant test-patch gave [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.
          Ravi Gummadi made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Hide
          Devaraj Das added a comment -

          I just committed this. Thanks, Ravi!

          Show
          Devaraj Das added a comment - I just committed this. Thanks, Ravi!
          Devaraj Das made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Hadoop Flags [Reviewed]
          Resolution Fixed [ 1 ]
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Since test-patch is broken (see HADOOP-6124), the patch committed introduced a javac warning without being detected.

              [javac] d:\@sze\hadoop\mapreduce\m1\src\java\org\apache\hadoop\mapred\LocalJobRunner.java:74: warning: [unchecked]
           unchecked call to serialize(T) as a member of the raw type org.apache.hadoop.io.serializer.Serializer
              [javac]       serializer.serialize(splits.get(i));
              [javac]                           ^
          
          Show
          Tsz Wo Nicholas Sze added a comment - Since test-patch is broken (see HADOOP-6124 ), the patch committed introduced a javac warning without being detected. [javac] d:\@sze\hadoop\mapreduce\m1\src\java\org\apache\hadoop\mapred\LocalJobRunner.java:74: warning: [unchecked] unchecked call to serialize(T) as a member of the raw type org.apache.hadoop.io.serializer.Serializer [javac] serializer.serialize(splits.get(i)); [javac] ^
          Tom White made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Transition Time In Source Status Execution Times Last Executer Last Execution Date
          Patch Available Patch Available Open Open
          1d 2h 27m 2 Ravi Gummadi 22/Jul/09 13:14
          Open Open Patch Available Patch Available
          12d 7h 54m 3 Ravi Gummadi 22/Jul/09 14:00
          Patch Available Patch Available Resolved Resolved
          2h 24m 1 Devaraj Das 22/Jul/09 16:25
          Resolved Resolved Closed Closed
          398d 5h 49m 1 Tom White 24/Aug/10 22:14

            People

            • Assignee:
              Ravi Gummadi
              Reporter:
              Ravi Gummadi
            • Votes:
              1 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development