Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-3039

Runtime exceptions not killing job

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.16.0
    • Fix Version/s: 0.16.2
    • Component/s: None
    • Labels:
      None

      Description

      If a map or reduce task threw a runtime exception such as an NPE, the task, and ultimately the job, would fail in short order. In 0.16.0, when the reduce tasks started throwing NPEs, the tasks just hung. Eventually they timed out and were killed. But task has to get killed immediately if it throws NPE.

      Thread dump shows:
      "DestroyJavaVM" prio=10 tid=0x0805f800 nid=0x6b5a waiting on condition [0x00000000..0xbfffcc90]
      java.lang.Thread.State: RUNNABLE

      "Thread-12" prio=10 tid=0x083f1400 nid=0x6b87 in Object.wait() [0xa2f37000..0xa2f37eb0]
      java.lang.Thread.State: TIMED_WAITING (on object monitor)
      at java.lang.Object.wait(Native Method)

      • waiting on <0xa3af62a0> (a java.util.LinkedList)
        at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1680)
      • locked <0xa3af62a0> (a java.util.LinkedList)

      "Comm thread for task_200803181240_0001_r_000000_0" daemon prio=10 tid=0x0841f000 nid=0x6b6f waiting on condition [0xa307c000..0xa307c130]
      java.lang.Thread.State: TIMED_WAITING (sleeping)
      at java.lang.Thread.sleep(Native Method)
      at org.apache.hadoop.mapred.Task$1.run(Task.java:283)
      at java.lang.Thread.run(Unknown Source)

      "org.apache.hadoop.dfs.DFSClient$LeaseChecker@edf3f6" daemon prio=10 tid=0x083fc400 nid=0x6b6d waiting on condition [0xa30cd000..0xa30cd1b0]
      java.lang.Thread.State: TIMED_WAITING (sleeping)
      at java.lang.Thread.sleep(Native Method)
      at org.apache.hadoop.dfs.DFSClient$LeaseChecker.run(DFSClient.java:626)
      at java.lang.Thread.run(Unknown Source)

      "IPC Client connection to localhost/127.0.0.1:9000" daemon prio=10 tid=0x083f6800 nid=0x6b6c in Object.wait() [0xa311d000..0xa311e030]
      java.lang.Thread.State: WAITING (on object monitor)
      at java.lang.Object.wait(Native Method)

      • waiting on <0xa4ac0860> (a org.apache.hadoop.ipc.Client$Connection)
        at java.lang.Object.wait(Object.java:485)
        at org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:247)
      • locked <0xa4ac0860> (a org.apache.hadoop.ipc.Client$Connection)
        at org.apache.hadoop.ipc.Client$Connection.run(Client.java:286)

      It looks like Task is waiting for DataStreamer thread to get closed.
      When I did streamer.setDaemon(true), the behavior was fine.

        Attachments

        1. patch-3039.txt
          0.5 kB
          Amareshwari Sriramadasu

          Issue Links

            Activity

              People

              • Assignee:
                amareshwari Amareshwari Sriramadasu
                Reporter:
                amareshwari Amareshwari Sriramadasu
              • Votes:
                0 Vote for this issue
                Watchers:
                0 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: