Uploaded image for project: 'Apache Tez'
  1. Apache Tez
  2. TEZ-1642

TestAMRecovery sometimes fail

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.5.4
    • None
    • None

    Description

      TestAMRecovery fails sometimes on testVertexPartiallyFinished_XXX.
      The scenario is that we'd like kill AM when vertex is partially finished ( with 2 tasks, task_0 is finished and task_1 is running). When in recovery, task_0 should not rerun and task_1 should rerun. ( We use the recovery log(TaskAttemptFinishedEvent) to judge whether task is rerun)
      Currently, using VertexManager.onSourceTaskCompleted to control when to kill AM, but it is not perfect. VertexManager.onSourceTaskCompleted is not invoked at the moment task attempt is finished ( TaskAttempt send event to Task to tell TaskAttempt is finsihed, and then Task send event to Vertex to trigger VM.onSourceTaskCompleted)
      The following case is possible: task_0 finished -> task_1 finished -> VM.onSourceTaskCompleted -> VM.onSourceTaskCompleted
      In this case, we will take it as partially completed in the first VM.onSourceTaskCompleted, but actually the vertex is fully completed.

      Attachments

        1. TEZ-1642.patch
          8 kB
          Jeff Zhang
        2. TEZ-1642-2.patch
          8 kB
          Jeff Zhang
        3. TEZ-1642-3.patch
          23 kB
          Jeff Zhang
        4. TEZ-1642-4.patch
          37 kB
          Jeff Zhang
        5. TEZ-1642-5.patch
          17 kB
          Jeff Zhang

        Activity

          People

            zjffdu Jeff Zhang
            zjffdu Jeff Zhang
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: