Uploaded image for project: 'Apache Tez'
  1. Apache Tez
  2. TEZ-924

InputFailedEvent handling for Shuffle

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Duplicate
    • None
    • None
    • None
    • None

    Description

      Shuffle receives batches of Events to process from the AM. The way these events are sent over to the ShuffleHandlers and the way they're processed - it's possible that Shuffle will start fetching data from an Event, which is to be subsequently marked as failed (via an InputFailedEvent)

      1) The AM sends events in batches. An InputFailedEvent for a specific Input may not be part of the same batch which contained the original event which is being marked bad.

      2) The ShuffleEventHandler processes the events in each batch one event at a time - so even if the InputFailedEvent follows - it's possible for Shuffle to start fetching data from a Failed Input.

      The AM needs to change to invalidate Inputs up front - so that related events don't span batches. Alternately, it needs to apply the InputFailedEvent to the original event being sent.
      The Shuffle itself should process a batch update as a batch - that would prevent fetchers from starting early even though there may be additional events for the same host.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              sseth Siddharth Seth
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: