Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-6852

Job#updateStatus() failed with NPE due to race condition

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.9.0, 3.0.0-alpha4
    • Component/s: None
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      Like MAPREDUCE-6762, we found this issue in a cluster where Pig query occasionally failed on NPE - "Pig uses JobControl API to track MR job status, but sometimes Job History Server failed to flush job meta files to HDFS which caused the status update failed." Beside NPE in o.a.h.mapreduce.Job.getJobName, we also get NPE in Job.updateStatus() and the exception is as following:

      Caused by: java.lang.NullPointerException
      	at org.apache.hadoop.mapreduce.Job$1.run(Job.java:323)
      	at org.apache.hadoop.mapreduce.Job$1.run(Job.java:320)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at javax.security.auth.Subject.doAs(Subject.java:422)
      	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1833)
      	at org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:320)
      	at org.apache.hadoop.mapreduce.Job.isComplete(Job.java:604)
      

      We found state here is null. However, we already check the job state to be RUNNING as code below:

        public boolean isComplete() throws IOException {
          ensureState(JobState.RUNNING);
          updateStatus();
          return status.isJobComplete();
        }
      

      The only possible reason here is two threads are calling here for the same time: ensure state first, then one thread update the state to null while the other thread hit NPE issue here.
      We should fix this NPE exception.

        Attachments

        1. MAPREDUCE-6852.patch
          1 kB
          Junping Du
        2. MAPREDUCE-6852-v2.patch
          0.9 kB
          Junping Du

          Issue Links

            Activity

              People

              • Assignee:
                djp Junping Du
                Reporter:
                djp Junping Du
              • Votes:
                0 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: