Uploaded image for project: 'Apache Tez'
  1. Apache Tez
  2. TEZ-2103

Implement a Partial completion VertexManagerPlugin

Attach filesAttach ScreenshotAdd voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: Patch Available
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None

      Description

      Currently, there is no sibling communication between tasks - this implies that a task can be completed by the first vertex in a wave of tasks, but the entire wave of tasks has to complete before success can be reported.

      This occurs in limit + filter query patterns common between the data access engines.

      select * from data where x > 1 limit 10;
      

      will run through a full-table scan worth of tasks to generate 10 rows per task, to aggregate it to produce the final 10 row result.

      The VertexManager receives counters/events early enough to short-circuit the rest of the vertex tasks, to prevent the remainder of tasks from getting scheduled when the limit condition has been satisfied by an initial sub-set of the tasks.

      This is a specialization of the VertexManagerPlugin for this common case scheduling pattern.

        Attachments

        1. TEZ-2103.WIP.patch
          6 kB
          Syed Shameerur Rahman
        2. TEZ-2103.01.patch
          6 kB
          Syed Shameerur Rahman
        3. TEZ-2103.02.patch
          7 kB
          Syed Shameerur Rahman
        4. TEZ-2103.03.patch
          7 kB
          Syed Shameerur Rahman

          Activity

            People

            • Assignee:
              srahman Syed Shameerur Rahman
              Reporter:
              gopalv Gopal Vijayaraghavan

              Dates

              • Created:
                Updated:

                Issue deployment