Hadoop Common
  1. Hadoop Common
  2. HADOOP-961

a cli tool to get the event logs from a job

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.10.1
    • Fix Version/s: 0.11.0
    • Component/s: None
    • Labels:
      None

      Description

      Here is a little tool to list the events for a given job. The output can be used to find where each task ran.

      1. event-log.patch
        2 kB
        Owen O'Malley

        Activity

        Hide
        Owen O'Malley added a comment -

        The command is invoked as:

        % bin/hadoop job -events job_0001

        and the output looks like:

        Task completion events for job_0001
        SUCCEEDED task_0001_m_000049_0 http://node:2030/tasklog.jsp?
        plaintext=true&taskid=task_0001_m_000049_0&all=true
        SUCCEEDED task_0001_m_000019_0 http://node:2030/tasklog.jsp?
        plaintext=true&taskid=task_0001_m_000019_0&all=true
        SUCCEEDED task_0001_m_000022_0 http://node:2030/tasklog.jsp?
        plaintext=true&taskid=task_0001_m_000022_0&all=true

        Show
        Owen O'Malley added a comment - The command is invoked as: % bin/hadoop job -events job_0001 and the output looks like: Task completion events for job_0001 SUCCEEDED task_0001_m_000049_0 http://node:2030/tasklog.jsp? plaintext=true&taskid=task_0001_m_000049_0&all=true SUCCEEDED task_0001_m_000019_0 http://node:2030/tasklog.jsp? plaintext=true&taskid=task_0001_m_000019_0&all=true SUCCEEDED task_0001_m_000022_0 http://node:2030/tasklog.jsp? plaintext=true&taskid=task_0001_m_000022_0&all=true
        Hide
        Owen O'Malley added a comment -

        On a side note, it would probably be better to just have the tasktracker's base url in the events. Currently it has the url to fetch the task output.

        To find where the reduces were run, use:

        bin/hadoop job -events job_0001 | grep r | grep SUCCEEDED| sort | \
        sed -e 's/[^ ]* //' -e 's|http://||' -e 's/:.*//'

        Show
        Owen O'Malley added a comment - On a side note, it would probably be better to just have the tasktracker's base url in the events. Currently it has the url to fetch the task output. To find where the reduces were run, use: bin/hadoop job -events job_0001 | grep r | grep SUCCEEDED| sort | \ sed -e 's/ [^ ] * //' -e 's|http://||' -e 's/:.*//'
        Hide
        Hadoop QA added a comment -

        +1, because http://issues.apache.org/jira/secure/attachment/12349948/event-log.patch applied and successfully tested against trunk revision r501616.

        Show
        Hadoop QA added a comment - +1, because http://issues.apache.org/jira/secure/attachment/12349948/event-log.patch applied and successfully tested against trunk revision r501616.
        Hide
        Hairong Kuang added a comment -

        Looks like the output file name of a reducer comes from the partition id. So I am not able to get the output file name from the taskid. Could you please include the partition id in TaskCompletionEvent as well?

        Show
        Hairong Kuang added a comment - Looks like the output file name of a reducer comes from the partition id. So I am not able to get the output file name from the taskid. Could you please include the partition id in TaskCompletionEvent as well?
        Hide
        Hairong Kuang added a comment -

        Ok, I realized that the partition id is the field after [m/r] in a task id. Although both a reduce task's ids and its output file name use the same partition id as part of its name, the task id is generated in TaskInProgress where NumberFormat sets the number of the mininum integer digits to be 6, while an output file name is generated in ReduceTask where NumberFormat sets the number of the minimum integer digits to be 5.

        Show
        Hairong Kuang added a comment - Ok, I realized that the partition id is the field after [m/r] in a task id. Although both a reduce task's ids and its output file name use the same partition id as part of its name, the task id is generated in TaskInProgress where NumberFormat sets the number of the mininum integer digits to be 6, while an output file name is generated in ReduceTask where NumberFormat sets the number of the minimum integer digits to be 5.
        Hide
        Doug Cutting added a comment -

        I just committed this. Thanks, Owen!

        Show
        Doug Cutting added a comment - I just committed this. Thanks, Owen!

          People

          • Assignee:
            Owen O'Malley
            Reporter:
            Owen O'Malley
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development