Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-4296

Spasm of JobClient failures on successful jobs every once in a while

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Blocker
    • Resolution: Fixed
    • 0.17.1
    • 0.19.0
    • None
    • None
    • Reviewed

    Description

      At very busy times - we get a wave of job client failures all at the same time. the failures come when the job is about to complete. when we look at the job history files - the jobs are actually complete. Here's the stack:

      08/09/27 02:18:00 INFO mapred.JobClient: map 100% reduce 98%
      08/09/27 02:18:41 INFO mapred.JobClient: map 100% reduce 99%
      java.lang.NullPointerException
      at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:993)
      at com.facebook.hive.common.columnSetLoader.main(columnSetLoader.java:535)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
      at java.lang.reflect.Method.invoke(Method.java:597)
      at org.apache.hadoop.util.RunJar.main(RunJar.java:155)

      Attachments

        1. 4296_jt_delayretire4.patch
          3 kB
          Dhruba Borthakur
        2. 4296_jt_delayretire3.patch
          3 kB
          Dhruba Borthakur
        3. 4296_jt_delayretire2.patch
          2 kB
          Dhruba Borthakur
        4. 4296_jt_delayretire.patch
          3 kB
          Dhruba Borthakur

        Activity

          People

            dhruba Dhruba Borthakur
            jsensarma Joydeep Sen Sarma
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: