Hadoop Common
  1. Hadoop Common
  2. HADOOP-1740

Certain Pipes tasks fail, after exiting the C++ application

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.14.0
    • Fix Version/s: 0.14.1
    • Component/s: None
    • Labels:
      None
    • Environment:

      Version: 0.15.0-dev, r565628
      Compiled: Tue Aug 14 20:55:37 UTC 2007 by hadoopqa
      165 nodes

      Description

      Steps to reproduce:
      Run a pipes job. I had 1182 mappers and 300 reducers and produced 4,353,892,559 records and 212,030,552,146 bytes output.

      Some of the tasks failed (randomly), all had the same exception (probably a race condition)

      task_200708201818_0002_m_000681_0 tip_200708201818_0002_m_000681 node075 FAILED

      java.lang.NullPointerException
      at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:592)
      at org.apache.hadoop.mapred.MapTask.run(MapTask.java:190)
      at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1800)

      Last 4KB
      Last 8KB
      All
      task_200708201818_0002_m_000681_1 tip_200708201818_0002_m_000681 node1127 FAILED

      java.lang.NullPointerException
      at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:592)
      at org.apache.hadoop.mapred.MapTask.run(MapTask.java:190)
      at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1800)

      Last 4KB
      Last 8KB
      All
      task_200708201818_0002_m_000868_0 tip_200708201818_0002_m_000868 node160 FAILED

      java.io.IOException: Task process exit with nonzero status of 1.
      at org.apache.hadoop.mapred.TaskRunner.runChild(TaskRunner.java:405)
      at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:291)

      Last 4KB
      Last 8KB
      All

      2007-08-20 19:25:59,909 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=MAP, sessionId=
      2007-08-20 19:26:00,002 INFO org.apache.hadoop.mapred.MapTask: numReduceTasks: 300
      2007-08-20 19:26:01,344 INFO org.apache.hadoop.util.NativeCodeLoader: Loaded the native-hadoop library
      2007-08-20 19:26:01,345 INFO org.apache.hadoop.io.compress.zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
      2007-08-20 19:49:26,727 WARN org.apache.hadoop.mapred.TaskTracker: Error running child
      java.lang.NullPointerException
      at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:592)
      at org.apache.hadoop.mapred.MapTask.run(MapTask.java:190)
      at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1800)

      1. 1740.patch
        0.7 kB
        Devaraj Das
      2. 1740.patch
        0.7 kB
        Devaraj Das

        Activity

        Hide
        Owen O'Malley added a comment -

        I just committed this. Thanks, Devaraj.

        Show
        Owen O'Malley added a comment - I just committed this. Thanks, Devaraj.
        Show
        Hadoop QA added a comment - +1 http://issues.apache.org/jira/secure/attachment/12364538/1740.patch applied and successfully tested against trunk revision r569501. Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/617/testReport/ Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/617/console
        Hide
        Devaraj Das added a comment -

        This should fix the findbugs issue.

        Show
        Devaraj Das added a comment - This should fix the findbugs issue.
        Hide
        Hadoop QA added a comment -
        Show
        Hadoop QA added a comment - +0, new Findbugs warnings http://issues.apache.org/jira/secure/attachment/12364389/1740.patch applied and successfully tested against trunk revision r569063, but there appear to be new Findbugs warnings introduced by this patch. New Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/613/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/613/testReport/ Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/613/console
        Hide
        Devaraj Das added a comment -

        Looks like this bug got introduced when the patch for HADOOP-1698 got committed. In that, keyValBuffer is set to null under a condition and pre-1698 it would not be done. The attached patch should fix the NPE problem. Could you please give it a shot.

        Show
        Devaraj Das added a comment - Looks like this bug got introduced when the patch for HADOOP-1698 got committed. In that, keyValBuffer is set to null under a condition and pre-1698 it would not be done. The attached patch should fix the NPE problem. Could you please give it a shot.
        Hide
        Srikanth Kakani added a comment -

        Bumping up the priority to blocker. Except one no other run was successful

        Show
        Srikanth Kakani added a comment - Bumping up the priority to blocker. Except one no other run was successful

          People

          • Assignee:
            Devaraj Das
            Reporter:
            Srikanth Kakani
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development