Hadoop Common
  1. Hadoop Common
  2. HADOOP-293

map reduce job fail without reporting a reason

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.3.1
    • Fix Version/s: 0.7.0
    • Component/s: None
    • Labels:
      None

      Description

      Often I see in the WI reports of tasks failing without information reported as to the reason of the failure.
      It makes analysis and fixing the problem much harder.
      The reason for the failure should always be reported in the WI.

      1. report-error-1.patch
        3 kB
        Mikkel Kamstrup Erlandsen
      2. err-report.patch
        5 kB
        Owen O'Malley

        Activity

        Owen O'Malley made changes -
        Component/s mapred [ 12310690 ]
        Doug Cutting made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Doug Cutting made changes -
        Resolution Fixed [ 1 ]
        Fix Version/s 0.7.0 [ 12312051 ]
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Hide
        Doug Cutting added a comment -

        I just committed this. Thanks, Owen!

        Show
        Doug Cutting added a comment - I just committed this. Thanks, Owen!
        Owen O'Malley made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Owen O'Malley made changes -
        Attachment err-report.patch [ 12341139 ]
        Hide
        Owen O'Malley added a comment -

        The problem was that the web ui was not looking at the complete list of diagnostics, just the diagnostic that was sent with the last status report. This patch makes it generate the complete list.

        Show
        Owen O'Malley added a comment - The problem was that the web ui was not looking at the complete list of diagnostics, just the diagnostic that was sent with the last status report. This patch makes it generate the complete list.
        Doug Cutting made changes -
        Fix Version/s 0.6.0 [ 12312025 ]
        Doug Cutting made changes -
        Fix Version/s 0.5.0 [ 12311939 ]
        Fix Version/s 0.6.0 [ 12312025 ]
        Doug Cutting made changes -
        Workflow no-reopen-closed [ 12373610 ] no-reopen-closed, patch-avail [ 12377495 ]
        Mikkel Kamstrup Erlandsen made changes -
        Attachment report-error-1.patch [ 12337880 ]
        Hide
        Mikkel Kamstrup Erlandsen added a comment -

        I've had my share of troubles regarding this too. When a task encounters an error, all I see is:

        Exception in thread "main" java.io.IOException: Job failed!
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:357)
        ...
        <snip useless info>

        I attach a preview patch of my suggestion. It is against 0.4, but I'll forward port it to head and integrate it more with the rest of the system, if the approach is generally accepted by the devs. Please consider the patch as a idea-preview, not as a serious stab at the problem.

        The approach is to add a public JobStatus.lastError string, which can be set from any throwable like JobStatus.setLastError(Throwable t). Setting this at relevant places (fx. on errors in mapred.LocalJobRunner.run() as in the patch) is useful for debugging purposes (for me atleast).

        Show
        Mikkel Kamstrup Erlandsen added a comment - I've had my share of troubles regarding this too. When a task encounters an error, all I see is: Exception in thread "main" java.io.IOException: Job failed! at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:357) ... <snip useless info> I attach a preview patch of my suggestion. It is against 0.4, but I'll forward port it to head and integrate it more with the rest of the system, if the approach is generally accepted by the devs. Please consider the patch as a idea-preview, not as a serious stab at the problem. The approach is to add a public JobStatus.lastError string, which can be set from any throwable like JobStatus.setLastError(Throwable t). Setting this at relevant places (fx. on errors in mapred.LocalJobRunner.run() as in the patch) is useful for debugging purposes (for me atleast).
        Doug Cutting made changes -
        Fix Version/s 0.4.0 [ 12311021 ]
        Fix Version/s 0.5.0 [ 12311939 ]
        Owen O'Malley made changes -
        Field Original Value New Value
        Assignee Owen O'Malley [ owen.omalley ]
        Yoram Arnon created issue -

          People

          • Assignee:
            Owen O'Malley
            Reporter:
            Yoram Arnon
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development