Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-4595

TestLostTracker failing - possibly due to a race in JobHistory.JobHistoryFilesManager#run()

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 1.0.3
    • Fix Version/s: 1.2.0
    • Component/s: None
    • Labels:
    • Hadoop Flags:
      Reviewed

      Description

      The source for occasional failure of TestLostTracker seems like the following:

      On job completion, JobHistoryFilesManager#run() spawns another thread to move history files to done folder. TestLostTracker waits for job completion, before checking the file format of the history file. However, the history files move might be in the process or might not have started in the first place.

      The attachment (force-TestLostTracker-failure.patch) helps reproducing the error locally, by increasing the chance of hitting this race.

      1. MR-4595.patch
        1 kB
        Karthik Kambatla
      2. MR-4595.patch
        1 kB
        Karthik Kambatla
      3. force-TestLostTracker-failure.patch
        0.9 kB
        Karthik Kambatla

        Activity

        Karthik Kambatla (Inactive) created issue -
        Karthik Kambatla (Inactive) made changes -
        Field Original Value New Value
        Attachment force-TestLostTracker-failure.patch [ 12542429 ]
        Karthik Kambatla (Inactive) made changes -
        Assignee Karthik Kambatla [ kkambatl ]
        Karthik Kambatla (Inactive) made changes -
        Attachment MR-4595.patch [ 12542627 ]
        Karthik Kambatla (Inactive) made changes -
        Status Open [ 1 ] In Progress [ 3 ]
        Karthik Kambatla (Inactive) made changes -
        Status In Progress [ 3 ] Patch Available [ 10002 ]
        Karthik Kambatla (Inactive) made changes -
        Description The source for occasional failure of TestLostTracker seems like the following:

        On job completion, JobHistoryFilesManager#run() spawns another thread to move history files to done folder. TestLostTracker waits for job completion, before checking the file format of the history file. However, the history files move might be in the process or might not have started in the first place.

        I am uploading a patch that significantly increases the chance of hitting this race.
        The source for occasional failure of TestLostTracker seems like the following:

        On job completion, JobHistoryFilesManager#run() spawns another thread to move history files to done folder. TestLostTracker waits for job completion, before checking the file format of the history file. However, the history files move might be in the process or might not have started in the first place.

        The attachment (force-TestLostTracker-failure.patch) helps reproducing the error locally, by increasing the chance of hitting this race.
        Karthik Kambatla (Inactive) made changes -
        Attachment MR-4595.patch [ 12542684 ]
        Alejandro Abdelnur made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Hadoop Flags Reviewed [ 10343 ]
        Fix Version/s 1.2.0 [ 12321661 ]
        Resolution Fixed [ 1 ]
        Matt Foley made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Gavin made changes -
        Assignee Karthik Kambatla [ kkambatl ] Karthik Kambatla [ kasha ]
        Gavin made changes -
        Reporter Karthik Kambatla [ kkambatl ] Karthik Kambatla [ kasha ]

          People

          • Assignee:
            Karthik Kambatla
            Reporter:
            Karthik Kambatla
          • Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development