Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-4595

TestLostTracker failing - possibly due to a race in JobHistory.JobHistoryFilesManager#run()

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 1.0.3
    • Fix Version/s: 1.2.0
    • Component/s: None
    • Labels:
    • Hadoop Flags:
      Reviewed

      Description

      The source for occasional failure of TestLostTracker seems like the following:

      On job completion, JobHistoryFilesManager#run() spawns another thread to move history files to done folder. TestLostTracker waits for job completion, before checking the file format of the history file. However, the history files move might be in the process or might not have started in the first place.

      The attachment (force-TestLostTracker-failure.patch) helps reproducing the error locally, by increasing the chance of hitting this race.

      1. force-TestLostTracker-failure.patch
        0.9 kB
        Karthik Kambatla
      2. MR-4595.patch
        1 kB
        Karthik Kambatla
      3. MR-4595.patch
        1 kB
        Karthik Kambatla

        Activity

        Hide
        Karthik Kambatla added a comment -

        Uploading a patch for branch-1.

        I understand it is not the absolute fool approach, as the test still fails if the thread moving the file takes longer than 5 minutes. However, it is a cause of concern if it takes longer than that.

        Please feel free to suggest alternate/better approaches.

        Show
        Karthik Kambatla added a comment - Uploading a patch for branch-1. I understand it is not the absolute fool approach, as the test still fails if the thread moving the file takes longer than 5 minutes. However, it is a cause of concern if it takes longer than that. Please feel free to suggest alternate/better approaches.
        Hide
        Karthik Kambatla added a comment -
        • I meant fool proof approach in my previous comment.

        With the patch, the test passes, in the presence of the sleep in JobHistoryFilesManager#run() as in the force-failure patch.

        Show
        Karthik Kambatla added a comment - I meant fool proof approach in my previous comment. With the patch, the test passes, in the presence of the sleep in JobHistoryFilesManager#run() as in the force-failure patch.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12542627/MR-4595.patch
        against trunk revision .

        -1 patch. The patch command could not apply the patch.

        Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2774//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12542627/MR-4595.patch against trunk revision . -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2774//console This message is automatically generated.
        Hide
        Karthik Kambatla added a comment -

        Uploading a new patch that incorporates Alejandro's offline comments:

        • Use while(max-wait-time) instead of for(i < 10)
        • Sleep for shorter time (50 ms)
        Show
        Karthik Kambatla added a comment - Uploading a new patch that incorporates Alejandro's offline comments: Use while(max-wait-time) instead of for(i < 10) Sleep for shorter time (50 ms)
        Hide
        Alejandro Abdelnur added a comment -

        +1

        Show
        Alejandro Abdelnur added a comment - +1
        Hide
        Alejandro Abdelnur added a comment -

        Thanks Karthik. Committed to branch-1.

        Show
        Alejandro Abdelnur added a comment - Thanks Karthik. Committed to branch-1.
        Hide
        Matt Foley added a comment -

        Closed upon release of Hadoop 1.2.0.

        Show
        Matt Foley added a comment - Closed upon release of Hadoop 1.2.0.

          People

          • Assignee:
            Karthik Kambatla
            Reporter:
            Karthik Kambatla
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development