Uploaded image for project: 'Apache Tez'
  1. Apache Tez
  2. TEZ-15 Support for DAG AM recovery
  3. TEZ-2431

Recovery of task events (eg. datamovement events) should not depend on ordering of task attempt events

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Won't Fix
    • None
    • None
    • None
    • None

    Description

      Today, task attempt events need to go through verteximpl before reaching the task in order to maintain ordering guarantees for recovery. This causes these events to be routed twice through the dispatcher. This can cause overhead delays in large jobs. Also, this makes assumptions about event ordering which make the system fragile. Recovery should work independently of other system interactions so that evolution of other components is not affected by recovery unless it affects recovery logically.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              bikassaha Bikas Saha
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: