Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-3678

The Map tasks logs should have the value of input split it processed

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.0.0, 2.0.0-alpha
    • Fix Version/s: 1.2.0, 2.0.3-alpha
    • Component/s: mrv1, mrv2
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      A map-task's syslogs now carries basic info on the InputSplit it processed.

      Description

      It would be easier to debug some corner in tasks if we knew what was the input split processed by that task. Map reduce task tracker log should accommodate the same. Also in the jobdetails web UI, the split also should be displayed along with the Split Locations.

      Sample as
      Input Split
      hdfs://myserver:9000/userdata/sampleapp/inputdir/file1.csv - <split no>/<offset from beginning of file>

      This would be much beneficial to nail down some data quality issues in large data volume processing.

      1. MAPREDUCE-3678-branch-1.patch
        0.9 kB
        Harsh J
      2. MAPREDUCE-3678.patch
        1 kB
        Harsh J

        Issue Links

          Activity

          Hide
          Harsh J added a comment -

          Task's own logs are the best place for this, not the daemons.

          The reason it is tedious to do/maintain at the framework level is that not all InputSplits may be FileSplits, and formats that do use FileSplits may use them in different ways as well (CombineFileIF, for instance).

          The InputSplit interface by itself is path-agnostic.

          Show
          Harsh J added a comment - Task's own logs are the best place for this, not the daemons. The reason it is tedious to do/maintain at the framework level is that not all InputSplits may be FileSplits, and formats that do use FileSplits may use them in different ways as well (CombineFileIF, for instance). The InputSplit interface by itself is path-agnostic.
          Hide
          Arun C Murthy added a comment -

          AFAIK MR1 already shows this in taskdetails.jsp - we need to add this to MR2.

          Also, AFAIK, I thought MR1 task-logs had this info logged, something I see missing in MR2 also.

          Show
          Arun C Murthy added a comment - AFAIK MR1 already shows this in taskdetails.jsp - we need to add this to MR2. Also, AFAIK, I thought MR1 task-logs had this info logged, something I see missing in MR2 also.
          Hide
          Bejoy KS added a comment -

          Ya it is available in taskdetails.jsp . But when we have a large number of jobs running on our cluster in a matter of half an hour the jobs would be in history and in in jobtaskshistory.jsp there are only the following values
          -Task Id
          -Start Time
          -Finish Time
          -Error

          Can we have one more filed here similar to status in taskdetails.jsp that would show the input split it processed as well.

          Once the job is in history viewer currently do we have any option to find this information?

          Show
          Bejoy KS added a comment - Ya it is available in taskdetails.jsp . But when we have a large number of jobs running on our cluster in a matter of half an hour the jobs would be in history and in in jobtaskshistory.jsp there are only the following values -Task Id -Start Time -Finish Time -Error Can we have one more filed here similar to status in taskdetails.jsp that would show the input split it processed as well. Once the job is in history viewer currently do we have any option to find this information?
          Hide
          Harsh J added a comment -

          Hi Arun,

          AFAIK MR1 already shows this in taskdetails.jsp - we need to add this to MR2.

          But this state is wiped away if the task sets a status. So I don't find it reliable

          Also, AFAIK, I thought MR1 task-logs had this info logged, something I see missing in MR2 also.

          We do not log this at all. I'll post patches that target both.

          Show
          Harsh J added a comment - Hi Arun, AFAIK MR1 already shows this in taskdetails.jsp - we need to add this to MR2. But this state is wiped away if the task sets a status. So I don't find it reliable Also, AFAIK, I thought MR1 task-logs had this info logged, something I see missing in MR2 also. We do not log this at all. I'll post patches that target both.
          Hide
          Harsh J added a comment -

          Once the job is in history viewer currently do we have any option to find this information?

          Unsure about this one, we can probably handle via another JIRA if its important to know via JH too (minus userlogs, i.e.). I'll file a new one after completing up the patches.

          Show
          Harsh J added a comment - Once the job is in history viewer currently do we have any option to find this information? Unsure about this one, we can probably handle via another JIRA if its important to know via JH too (minus userlogs, i.e.). I'll file a new one after completing up the patches.
          Hide
          Harsh J added a comment -

          Patch for branch-1.

          Show
          Harsh J added a comment - Patch for branch-1.
          Hide
          Harsh J added a comment -

          Patch for trunk attached.

          Show
          Harsh J added a comment - Patch for trunk attached.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12537726/MAPREDUCE-3678.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2653//testReport/
          Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2653//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12537726/MAPREDUCE-3678.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2653//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2653//console This message is automatically generated.
          Hide
          Harsh J added a comment -

          Hi,

          If no one has any objections to these INFO log additions, I'll commit it in in a couple of days.

          This helps projects such as Pig, Hive, etc. without any changes on their end.

          Show
          Harsh J added a comment - Hi, If no one has any objections to these INFO log additions, I'll commit it in in a couple of days. This helps projects such as Pig, Hive, etc. without any changes on their end.
          Hide
          Tom White added a comment -

          +1

          Show
          Tom White added a comment - +1
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk-Commit #2897 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2897/)
          MAPREDUCE-3678. The Map tasks logs should have the value of input split it processed. Contributed by Harsh J. (harsh) (Revision 1396032)

          Result = SUCCESS
          harsh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1396032
          Files :

          • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/MapTask.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #2897 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2897/ ) MAPREDUCE-3678 . The Map tasks logs should have the value of input split it processed. Contributed by Harsh J. (harsh) (Revision 1396032) Result = SUCCESS harsh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1396032 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/MapTask.java
          Hide
          Harsh J added a comment -

          Thanks Tom. I committed this to trunk, branch-2 and branch-1.

          Show
          Harsh J added a comment - Thanks Tom. I committed this to trunk, branch-2 and branch-1.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Common-trunk-Commit #2835 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2835/)
          MAPREDUCE-3678. The Map tasks logs should have the value of input split it processed. Contributed by Harsh J. (harsh) (Revision 1396032)

          Result = SUCCESS
          harsh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1396032
          Files :

          • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/MapTask.java
          Show
          Hudson added a comment - Integrated in Hadoop-Common-trunk-Commit #2835 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2835/ ) MAPREDUCE-3678 . The Map tasks logs should have the value of input split it processed. Contributed by Harsh J. (harsh) (Revision 1396032) Result = SUCCESS harsh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1396032 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/MapTask.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk-Commit #2858 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2858/)
          MAPREDUCE-3678. The Map tasks logs should have the value of input split it processed. Contributed by Harsh J. (harsh) (Revision 1396032)

          Result = FAILURE
          harsh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1396032
          Files :

          • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/MapTask.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #2858 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2858/ ) MAPREDUCE-3678 . The Map tasks logs should have the value of input split it processed. Contributed by Harsh J. (harsh) (Revision 1396032) Result = FAILURE harsh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1396032 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/MapTask.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk #1191 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1191/)
          MAPREDUCE-3678. The Map tasks logs should have the value of input split it processed. Contributed by Harsh J. (harsh) (Revision 1396032)

          Result = SUCCESS
          harsh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1396032
          Files :

          • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/MapTask.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #1191 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1191/ ) MAPREDUCE-3678 . The Map tasks logs should have the value of input split it processed. Contributed by Harsh J. (harsh) (Revision 1396032) Result = SUCCESS harsh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1396032 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/MapTask.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk #1222 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1222/)
          MAPREDUCE-3678. The Map tasks logs should have the value of input split it processed. Contributed by Harsh J. (harsh) (Revision 1396032)

          Result = SUCCESS
          harsh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1396032
          Files :

          • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/MapTask.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #1222 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1222/ ) MAPREDUCE-3678 . The Map tasks logs should have the value of input split it processed. Contributed by Harsh J. (harsh) (Revision 1396032) Result = SUCCESS harsh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1396032 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/MapTask.java

            People

            • Assignee:
              Harsh J
              Reporter:
              Bejoy KS
            • Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development