Uploaded image for project: 'Apache Tez'
  1. Apache Tez
  2. TEZ-3910

Single node can cause Tez job to fail during shuffle

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Patch Available
    • Major
    • Resolution: Unresolved
    • 0.9.1
    • None
    • None
    • None

    Description

      There is a race where a downstream task that is running into fetch failures due to bad output from the upstream task can continue to blame itself for the failure before the AM can do a re-run of the upstream offending task and fix the fetch failure. This causes the DAG to fail even if a single node fails.

      Attachments

        1. TEZ-3910.005.patch
          49 kB
          Kuhu Shukla
        2. TEZ-3910.004.patch
          46 kB
          Kuhu Shukla
        3. TEZ-3910.003.patch
          45 kB
          Kuhu Shukla
        4. TEZ-3910.002.patch
          46 kB
          Kuhu Shukla
        5. TEZ-3910.001.patch
          49 kB
          Kuhu Shukla

        Activity

          People

            kshukla Kuhu Shukla
            kshukla Kuhu Shukla
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated: