Uploaded image for project: 'Apache Tez'
  1. Apache Tez
  2. TEZ-3984

Shuffle: Out of Band DME event sending causes errors

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Patch Available
    • Priority: Critical
    • Resolution: Unresolved
    • Affects Version/s: 0.8.4, 0.9.1, 0.10.0
    • Fix Version/s: 0.10.0
    • Component/s: None
    • Labels:

      Description

      In case of a task Input throwing an exception, the outputs are also closed in the LogicalIOProcessorRuntimeTask.cleanup().

      Cleanup ignore all the events returned by output close, however if any output tries to send an event out of band by directly calling outputContext.sendEvents(events), then those events can reach the AM before the task failure is reported.

      This can cause correctness issues with shuffle since zero sized events can be sent out due to an input failure and downstream tasks may never reattempt a fetch from the valid attempt.

        Attachments

        1. TEZ-3984.2-branch-0.9.patch
          8 kB
          Jaume M
        2. TEZ-3984-branch-0.9.patch
          8 kB
          Jaume M
        3. TEZ-3984.5.patch
          8 kB
          Gopal Vijayaraghavan
        4. TEZ-3984.5.patch
          8 kB
          Jaume M
        5. TEZ-3984.4.patch
          8 kB
          Jaume M
        6. TEZ-3984.3.patch
          8 kB
          Jaume M
        7. TEZ-3984.2.patch
          8 kB
          Jaume M
        8. TEZ-3984.1.patch
          8 kB
          Jaume M

          Issue Links

            Activity

              People

              • Assignee:
                jmarhuen Jaume M
                Reporter:
                gopalv Gopal Vijayaraghavan
              • Votes:
                0 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated: