[TEZ-2103] Implement a Partial completion VertexManagerPlugin - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Patch Available
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: None
Labels:
- gsoc
- gsoc2015
- hadoop
- java
- tez

Description

Currently, there is no sibling communication between tasks - this implies that a task can be completed by the first vertex in a wave of tasks, but the entire wave of tasks has to complete before success can be reported.

This occurs in limit + filter query patterns common between the data access engines.

select * from data where x > 1 limit 10;

will run through a full-table scan worth of tasks to generate 10 rows per task, to aggregate it to produce the final 10 row result.

The VertexManager receives counters/events early enough to short-circuit the rest of the vertex tasks, to prevent the remainder of tasks from getting scheduled when the limit condition has been satisfied by an initial sub-set of the tasks.

This is a specialization of the VertexManagerPlugin for this common case scheduling pattern.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

TEZ-2103.WIP.patch
18/May/20 11:21
6 kB
Syed Shameerur Rahman
TEZ-2103.01.patch
02/Jul/20 13:11
6 kB
Syed Shameerur Rahman
TEZ-2103.02.patch
04/Aug/20 06:05
7 kB
Syed Shameerur Rahman
TEZ-2103.03.patch
04/Aug/20 07:05
7 kB
Syed Shameerur Rahman

Activity

People

Assignee:: Syed Shameerur Rahman

Reporter:: Gopal Vijayaraghavan

Votes:: 0 Vote for this issue

Watchers:: 11 Start watching this issue

Dates

Created:: 14/Feb/15 01:12

Updated:: 20/Jan/21 07:44