Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-4691

Historyserver can report "Unknown job" after RM says job has completed

    Details

    • Hadoop Flags:
      Reviewed
    • Target Version/s:

      Description

      Example traceback from the client:

      2012-09-27 20:28:38,068 [main] INFO  org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
      2012-09-27 20:28:38,530 [main] WARN  org.apache.hadoop.mapred.ClientServiceDelegate - Error from remote end: Unknown job job_1348097917603_3019
      2012-09-27 20:28:38,530 [main] ERROR org.apache.hadoop.security.UserGroupInformation - PriviledgedActionException as:xxx (auth:KERBEROS) cause:org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl: Unknown job job_1348097917603_3019
      2012-09-27 20:28:38,531 [main] WARN  org.apache.pig.tools.pigstats.JobStats - Failed to get map task report
      RemoteTrace: 
       at LocalTrace: 
              org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl: Unknown job job_1348097917603_3019
              at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:156)
              at $Proxy11.getJobReport(Unknown Source)
              at org.apache.hadoop.mapreduce.v2.api.impl.pb.client.MRClientProtocolPBClientImpl.getJobReport(MRClientProtocolPBClientImpl.java:116)
              at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
              at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
              at java.lang.reflect.Method.invoke(Method.java:597)
              at org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:298)
              at org.apache.hadoop.mapred.ClientServiceDelegate.getJobStatus(ClientServiceDelegate.java:383)
              at org.apache.hadoop.mapred.YARNRunner.getJobStatus(YARNRunner.java:482)
              at org.apache.hadoop.mapreduce.Cluster.getJob(Cluster.java:184)
      ...
      
      1. MR-4691.txt
        4 kB
        Robert Joseph Evans

        Activity

        Hide
        Jason Lowe added a comment -

        There is a race condition in the historyserver where two threads can be trying to scan the same user's done intermediate directory for two separate jobs. One thread will win the race and update the user timestamp in HistoryFileManager.scanIntermediateDirectory before it has actually completed the scan. The second thread will then see the timestamp has been updated, think there's no point in doing a scan, and return with no job found.

        Show
        Jason Lowe added a comment - There is a race condition in the historyserver where two threads can be trying to scan the same user's done intermediate directory for two separate jobs. One thread will win the race and update the user timestamp in HistoryFileManager.scanIntermediateDirectory before it has actually completed the scan. The second thread will then see the timestamp has been updated, think there's no point in doing a scan, and return with no job found.
        Hide
        Robert Joseph Evans added a comment -

        This patch should fix the issue. Recreating the problem is not that simple in the form of a unit tests, so I have not added any now ones in. I am going to manually verify the issue, but I wanted to kick jenkins in parallel.

        Show
        Robert Joseph Evans added a comment - This patch should fix the issue. Recreating the problem is not that simple in the form of a unit tests, so I have not added any now ones in. I am going to manually verify the issue, but I wanted to kick jenkins in parallel.
        Hide
        Robert Joseph Evans added a comment -

        Oops forgot to submit the patch

        Show
        Robert Joseph Evans added a comment - Oops forgot to submit the patch
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12547008/MR-4691.txt
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2890//testReport/
        Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2890//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12547008/MR-4691.txt against trunk revision . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2890//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2890//console This message is automatically generated.
        Hide
        Robert Joseph Evans added a comment -

        From my manual tests it looks like this patch does indeed solve the issue at hand.

        Show
        Robert Joseph Evans added a comment - From my manual tests it looks like this patch does indeed solve the issue at hand.
        Hide
        Jason Lowe added a comment -

        +1 lgtm, will commit this shortly.

        Show
        Jason Lowe added a comment - +1 lgtm, will commit this shortly.
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk-Commit #2848 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2848/)
        MAPREDUCE-4691. Historyserver can report "Unknown job" after RM says job has completed. Contributed by Robert Joseph Evans. (Revision 1391671)

        Result = SUCCESS
        jlowe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1391671
        Files :

        • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/HistoryFileManager.java
        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #2848 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2848/ ) MAPREDUCE-4691 . Historyserver can report "Unknown job" after RM says job has completed. Contributed by Robert Joseph Evans. (Revision 1391671) Result = SUCCESS jlowe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1391671 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/HistoryFileManager.java
        Hide
        Jason Lowe added a comment -

        Thanks, Bobby. I merged this into trunk, branch-2, and branch-0.23.

        Show
        Jason Lowe added a comment - Thanks, Bobby. I merged this into trunk, branch-2, and branch-0.23.
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk-Commit #2807 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2807/)
        MAPREDUCE-4691. Historyserver can report "Unknown job" after RM says job has completed. Contributed by Robert Joseph Evans. (Revision 1391671)

        Result = FAILURE
        jlowe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1391671
        Files :

        • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/HistoryFileManager.java
        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #2807 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2807/ ) MAPREDUCE-4691 . Historyserver can report "Unknown job" after RM says job has completed. Contributed by Robert Joseph Evans. (Revision 1391671) Result = FAILURE jlowe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1391671 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/HistoryFileManager.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Common-trunk-Commit #2785 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2785/)
        MAPREDUCE-4691. Historyserver can report "Unknown job" after RM says job has completed. Contributed by Robert Joseph Evans. (Revision 1391671)

        Result = SUCCESS
        jlowe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1391671
        Files :

        • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/HistoryFileManager.java
        Show
        Hudson added a comment - Integrated in Hadoop-Common-trunk-Commit #2785 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2785/ ) MAPREDUCE-4691 . Historyserver can report "Unknown job" after RM says job has completed. Contributed by Robert Joseph Evans. (Revision 1391671) Result = SUCCESS jlowe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1391671 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/HistoryFileManager.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-0.23-Build #389 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/389/)
        svn merge -c 1391671 FIXES: MAPREDUCE-4691. Historyserver can report "Unknown job" after RM says job has completed. Contributed by Robert Joseph Evans. (Revision 1391677)

        Result = UNSTABLE
        jlowe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1391677
        Files :

        • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
        • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/HistoryFileManager.java
        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-0.23-Build #389 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/389/ ) svn merge -c 1391671 FIXES: MAPREDUCE-4691 . Historyserver can report "Unknown job" after RM says job has completed. Contributed by Robert Joseph Evans. (Revision 1391677) Result = UNSTABLE jlowe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1391677 Files : /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/HistoryFileManager.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk #1180 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1180/)
        MAPREDUCE-4691. Historyserver can report "Unknown job" after RM says job has completed. Contributed by Robert Joseph Evans. (Revision 1391671)

        Result = SUCCESS
        jlowe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1391671
        Files :

        • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/HistoryFileManager.java
        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #1180 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1180/ ) MAPREDUCE-4691 . Historyserver can report "Unknown job" after RM says job has completed. Contributed by Robert Joseph Evans. (Revision 1391671) Result = SUCCESS jlowe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1391671 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/HistoryFileManager.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk #1211 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1211/)
        MAPREDUCE-4691. Historyserver can report "Unknown job" after RM says job has completed. Contributed by Robert Joseph Evans. (Revision 1391671)

        Result = FAILURE
        jlowe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1391671
        Files :

        • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/HistoryFileManager.java
        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #1211 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1211/ ) MAPREDUCE-4691 . Historyserver can report "Unknown job" after RM says job has completed. Contributed by Robert Joseph Evans. (Revision 1391671) Result = FAILURE jlowe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1391671 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/HistoryFileManager.java

          People

          • Assignee:
            Robert Joseph Evans
            Reporter:
            Jason Lowe
          • Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development