Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-1265

Include tasktracker name in the task attempt error log

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Trivial Trivial
    • Resolution: Fixed
    • Affects Version/s: 0.22.0
    • Fix Version/s: 0.21.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      When task attempt receive an error, TaskInProgress will log the task attempt id and diagnosis string in the JobTracker log.
      Ex:
      2009-xx-xx 23:50:45,994 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_2009xxxx_xxxx_r_000009_1: Error: java.lang.OutOfMemoryError: Java heap space
      2009-xx-xx 22:53:53,146 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_2009xxxx_xxxx_m_000478_0: Task attempt_2009xxxx_xxxx_m_000478_0 failed to report status for 601 seconds. Killing!

      When we want to debug a machine for example, a node has been blacklisted in the past few days.
      We have to use the task attempt id to find the TT. This is not very convenient.

      It will be nice if we can also log the tasktracker which causes this error.
      This way we can just grep the hostname to quickly find all the relevant error message.

      1. MAPREDUCE-1265-v2.patch
        1 kB
        Scott Chen
      2. MAPREDUCE-1265.patch
        1 kB
        Scott Chen

        Activity

        Scott Chen created issue -
        Scott Chen made changes -
        Field Original Value New Value
        Attachment MAPREDUCE-1265.patch [ 12426851 ]
        Scott Chen made changes -
        Attachment MAPREDUCE-1265.patch [ 12426851 ]
        Scott Chen made changes -
        Attachment MAPREDUCE-1265.patch [ 12426852 ]
        Scott Chen made changes -
        Attachment MAPREDUCE-1265-v2.patch [ 12426933 ]
        Scott Chen made changes -
        Summary Include jobId and hostname in the task attempt error log Include tasktracker name in the task attempt error log
        Description When task attempt receive an error, TaskInProgress will log the task attempt id and diagnosis string in the JobTracker log.
        Ex:
        2009-xx-xx 23:50:45,994 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_2009xxxx_xxxx_r_000009_1: Error: java.lang.OutOfMemoryError: Java heap space
        2009-xx-xx 22:53:53,146 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_2009xxxx_xxxx_m_000478_0: Task attempt_2009xxxx_xxxx_m_000478_0 failed to report status for 601 seconds. Killing!

        When we want to debug a machine or a job. We have to use the task attempt id to find these information.

        It will be much more convenient if we can just log them together.
        This way we can just grep the jobId or hostname to quickly find all the relevant error message.
        When task attempt receive an error, TaskInProgress will log the task attempt id and diagnosis string in the JobTracker log.
        Ex:
        2009-xx-xx 23:50:45,994 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_2009xxxx_xxxx_r_000009_1: Error: java.lang.OutOfMemoryError: Java heap space
        2009-xx-xx 22:53:53,146 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_2009xxxx_xxxx_m_000478_0: Task attempt_2009xxxx_xxxx_m_000478_0 failed to report status for 601 seconds. Killing!

        When we want to debug a machine for example, a blacklisted node.
        We have to use the task attempt id to find these information. This is not very convenient.

        It will be nice if we can also log the tasktracker which cauces this error.
        This way we can just grep the hostname to quickly find all the relevant error message.
        Scott Chen made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Affects Version/s 0.22.0 [ 12314184 ]
        Fix Version/s 0.22.0 [ 12314184 ]
        Scott Chen made changes -
        Description When task attempt receive an error, TaskInProgress will log the task attempt id and diagnosis string in the JobTracker log.
        Ex:
        2009-xx-xx 23:50:45,994 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_2009xxxx_xxxx_r_000009_1: Error: java.lang.OutOfMemoryError: Java heap space
        2009-xx-xx 22:53:53,146 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_2009xxxx_xxxx_m_000478_0: Task attempt_2009xxxx_xxxx_m_000478_0 failed to report status for 601 seconds. Killing!

        When we want to debug a machine for example, a blacklisted node.
        We have to use the task attempt id to find these information. This is not very convenient.

        It will be nice if we can also log the tasktracker which cauces this error.
        This way we can just grep the hostname to quickly find all the relevant error message.
        When task attempt receive an error, TaskInProgress will log the task attempt id and diagnosis string in the JobTracker log.
        Ex:
        2009-xx-xx 23:50:45,994 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_2009xxxx_xxxx_r_000009_1: Error: java.lang.OutOfMemoryError: Java heap space
        2009-xx-xx 22:53:53,146 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_2009xxxx_xxxx_m_000478_0: Task attempt_2009xxxx_xxxx_m_000478_0 failed to report status for 601 seconds. Killing!

        When we want to debug a machine for example, a blacklisted node.
        We have to use the task attempt id to find the TT. This is not very convenient.

        It will be nice if we can also log the tasktracker which cauces this error.
        This way we can just grep the hostname to quickly find all the relevant error message.
        Scott Chen made changes -
        Description When task attempt receive an error, TaskInProgress will log the task attempt id and diagnosis string in the JobTracker log.
        Ex:
        2009-xx-xx 23:50:45,994 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_2009xxxx_xxxx_r_000009_1: Error: java.lang.OutOfMemoryError: Java heap space
        2009-xx-xx 22:53:53,146 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_2009xxxx_xxxx_m_000478_0: Task attempt_2009xxxx_xxxx_m_000478_0 failed to report status for 601 seconds. Killing!

        When we want to debug a machine for example, a blacklisted node.
        We have to use the task attempt id to find the TT. This is not very convenient.

        It will be nice if we can also log the tasktracker which cauces this error.
        This way we can just grep the hostname to quickly find all the relevant error message.
        When task attempt receive an error, TaskInProgress will log the task attempt id and diagnosis string in the JobTracker log.
        Ex:
        2009-xx-xx 23:50:45,994 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_2009xxxx_xxxx_r_000009_1: Error: java.lang.OutOfMemoryError: Java heap space
        2009-xx-xx 22:53:53,146 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_2009xxxx_xxxx_m_000478_0: Task attempt_2009xxxx_xxxx_m_000478_0 failed to report status for 601 seconds. Killing!

        When we want to debug a machine for example, a node has been blacklisted in the past few days.
        We have to use the task attempt id to find the TT. This is not very convenient.

        It will be nice if we can also log the tasktracker which causes this error.
        This way we can just grep the hostname to quickly find all the relevant error message.
        Scott Chen made changes -
        Status Patch Available [ 10002 ] Open [ 1 ]
        Scott Chen made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        dhruba borthakur made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Hadoop Flags [Reviewed]
        Resolution Fixed [ 1 ]
        Tom White made changes -
        Fix Version/s 0.21.0 [ 12314045 ]
        Fix Version/s 0.22.0 [ 12314184 ]
        Tom White made changes -
        Status Resolved [ 5 ] Closed [ 6 ]

          People

          • Assignee:
            Scott Chen
            Reporter:
            Scott Chen
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development