Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
During discussion with users there was feedback that edge properties need to be named better to make them more clear. There was a suggestion to look at MPI for inspiration. Based on that feedback, the proposal is to renamed ConnectionPattern to DataMovement as that is essentially what the property is defining. A Bipartite connection pattern can be constructed from both broadcast and scatter-gather data movement types. There will be 3 kinds of data movements initially.
ONE_TO_ONE - Defines an output produced by the ith upstream task is available the the ith downstream task.
BROADCAST - Defines an output produced by any upstream task is available to all downstream tasks.
SCATTER_GATHER - Defines that the ith output produced by all upstream tasks is available to the same downstream task. Upstream tasks scatter there outputs and they are gathered by designated downstream tasks.
To be clear, output being available to the a task does not imply that the entire output is transferred/read by it. The task can choose to read any amount of the total data.
Current users: In the EdgeProperty object
Please change EdgeConnectionPattern.BIPARTITE -> DataMovementType.SCATTER_GATHER
Please change SourceType.STABLE -> DataSourceType.PERSISTED
Please add SchedulingType.SEQUENTIAL to EdgeProperty objects.
The getter methods have similar name changes.
Attachments
Attachments
Issue Links
- duplicates
-
TEZ-406 Refactor Edge API's to clarify connection patterns
- Resolved