Hadoop Common
  1. Hadoop Common
  2. HADOOP-293

map reduce job fail without reporting a reason

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.3.1
    • Fix Version/s: 0.7.0
    • Component/s: None
    • Labels:
      None

      Description

      Often I see in the WI reports of tasks failing without information reported as to the reason of the failure.
      It makes analysis and fixing the problem much harder.
      The reason for the failure should always be reported in the WI.

      1. report-error-1.patch
        3 kB
        Mikkel Kamstrup Erlandsen
      2. err-report.patch
        5 kB
        Owen O'Malley

        Activity

        Yoram Arnon created issue -
        Owen O'Malley made changes -
        Field Original Value New Value
        Assignee Owen O'Malley [ owen.omalley ]
        Doug Cutting made changes -
        Fix Version/s 0.4.0 [ 12311021 ]
        Fix Version/s 0.5.0 [ 12311939 ]
        Hide
        Mikkel Kamstrup Erlandsen added a comment -

        I've had my share of troubles regarding this too. When a task encounters an error, all I see is:

        Exception in thread "main" java.io.IOException: Job failed!
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:357)
        ...
        <snip useless info>

        I attach a preview patch of my suggestion. It is against 0.4, but I'll forward port it to head and integrate it more with the rest of the system, if the approach is generally accepted by the devs. Please consider the patch as a idea-preview, not as a serious stab at the problem.

        The approach is to add a public JobStatus.lastError string, which can be set from any throwable like JobStatus.setLastError(Throwable t). Setting this at relevant places (fx. on errors in mapred.LocalJobRunner.run() as in the patch) is useful for debugging purposes (for me atleast).

        Show
        Mikkel Kamstrup Erlandsen added a comment - I've had my share of troubles regarding this too. When a task encounters an error, all I see is: Exception in thread "main" java.io.IOException: Job failed! at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:357) ... <snip useless info> I attach a preview patch of my suggestion. It is against 0.4, but I'll forward port it to head and integrate it more with the rest of the system, if the approach is generally accepted by the devs. Please consider the patch as a idea-preview, not as a serious stab at the problem. The approach is to add a public JobStatus.lastError string, which can be set from any throwable like JobStatus.setLastError(Throwable t). Setting this at relevant places (fx. on errors in mapred.LocalJobRunner.run() as in the patch) is useful for debugging purposes (for me atleast).
        Mikkel Kamstrup Erlandsen made changes -
        Attachment report-error-1.patch [ 12337880 ]
        Doug Cutting made changes -
        Workflow no-reopen-closed [ 12373610 ] no-reopen-closed, patch-avail [ 12377495 ]
        Doug Cutting made changes -
        Fix Version/s 0.5.0 [ 12311939 ]
        Fix Version/s 0.6.0 [ 12312025 ]
        Doug Cutting made changes -
        Fix Version/s 0.6.0 [ 12312025 ]
        Hide
        Owen O'Malley added a comment -

        The problem was that the web ui was not looking at the complete list of diagnostics, just the diagnostic that was sent with the last status report. This patch makes it generate the complete list.

        Show
        Owen O'Malley added a comment - The problem was that the web ui was not looking at the complete list of diagnostics, just the diagnostic that was sent with the last status report. This patch makes it generate the complete list.
        Owen O'Malley made changes -
        Attachment err-report.patch [ 12341139 ]
        Owen O'Malley made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Hide
        Doug Cutting added a comment -

        I just committed this. Thanks, Owen!

        Show
        Doug Cutting added a comment - I just committed this. Thanks, Owen!
        Doug Cutting made changes -
        Resolution Fixed [ 1 ]
        Fix Version/s 0.7.0 [ 12312051 ]
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Doug Cutting made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Owen O'Malley made changes -
        Component/s mapred [ 12310690 ]

          People

          • Assignee:
            Owen O'Malley
            Reporter:
            Yoram Arnon
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development