Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-7131

Job History Server has race condition where it moves files from intermediate to finished but thinks file is in intermediate

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.7.4
    • Fix Version/s: 2.10.0, 3.2.0, 2.8.5, 2.7.8, 3.0.4, 2.9.2, 3.1.2
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      This is the race condition that can occur:

      1. during the first scanIntermediateDirectory(), HistoryFileInfo.moveToDone() is scheduled for job j1
      2. during the second scanIntermediateDirectory(), j1 is found again and put in the fileStatusList to process
      3. HistoryFileInfo.moveToDone() is processed in another thread and history files are moved to the finished directory
      4. the HistoryFileInfo for j1 is removed from jobListCache
      5. the j1 in fileStatusList is processed and a new HistoryFileInfo for j1 is created (history, conf, and summary files will point to the intermediate user directory, and state will be IN_INTERMEDIATE) and added to the jobListCache
      6. moveToDone() is scheduled for this new j1
      7. moveToDone() fails during moveToDoneNow() for the history file because the source path in the intermediate directory does not exist

      From this point on, while the new j1 HistoryFileInfo is in the jobListCache, the JobHistoryServer will think the history file is in the intermediate directory. If a user queries this job in the JobHistoryServer UI, they will get

      org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Could not load history file <scheme>://<host>:<port>/mr-history/intermediate/<user>/job_1529348381246_27275711-1535123223269-<user>-<jobname>-1535127026668-1-0-SUCCEEDED-<queue>-1535126980787.jhist
      

      Noticed this issue while running 2.7.4, but the race condition seems to still exist in trunk.

        Attachments

        1. MAPREDUCE-7131.6.patch
          5 kB
          Anthony Hsu
        2. MAPREDUCE-7131.5.patch
          5 kB
          Anthony Hsu
        3. MAPREDUCE-7131.4.patch
          9 kB
          Anthony Hsu
        4. MAPREDUCE-7131.3.patch
          9 kB
          Anthony Hsu
        5. MAPREDUCE-7131.2.patch
          4 kB
          Anthony Hsu
        6. MAPREDUCE-7131.1.patch
          4 kB
          Anthony Hsu

          Issue Links

            Activity

              People

              • Assignee:
                erwaman Anthony Hsu
                Reporter:
                erwaman Anthony Hsu
              • Votes:
                0 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: