Description
Because of HIVE-8793, the work graph for Spark is possibly modified by SplitSparkWorkResolver. Original:
Spark Edges: Reducer 2 <- Map 1 (SORT, 1) Reducer 3 <- Reducer 2 (GROUP, 1) Reducer 4 <- Reducer 2 (GROUP, 1)
New graph
Spark Edges: Reducer 3 <- Reducer 5 (GROUP, 1) Reducer 4 <- Reducer 6 (GROUP, 1) Reducer 5 <- Map 1 (SORT, 1) Reducer 6 <- Map 1 (SORT, 1)
where Reducer2 was splitted into Reducer5 and Reducer6.
Two types of ordering can be considered:
1. Topological order
Spark Edges: Reducer 5 <- Map 1 (SORT, 1) Reducer 6 <- Map 1 (SORT, 1) Reducer 3 <- Reducer 5 (GROUP, 1) Reducer 4 <- Reducer 6 (GROUP, 1)
2. DFS
Spark Edges: Reducer 5 <- Map 1 (SORT, 1) Reducer 3 <- Reducer 5 (GROUP, 1) Reducer 6 <- Map 1 (SORT, 1) Reducer 4 <- Reducer 6 (GROUP, 1)
Both seems better, though topolical seems more suitable for a graph. Please feel free to create a patch on trunk if needed.
Attachments
Attachments
Issue Links
- relates to
-
HIVE-8793 Refactor to make splitting SparkWork a physical resolver [Spark Branch]
- Resolved