Description
User-level explain plans have a section titled Vertex dependency in root stage - which (according to the name) prints out the dependencies between all vertices that are in the root stage.
This logic is controlled by DagJsonParser#print and it may print out Vertex dependency in root stage twice.
The logic in this method first extracts all stages and plans. It then iterates over all the stages, and if the stage contains any edges, it prints them out.
If we want to be consistent with the statement Vertex dependency in root stage then we should add a check to see if the stage we are processing during the iteration is the root stage or not.
Alternatively, we could print out the edges for each stage and change the line from Vertex dependency in root stage to Vertex dependency in [stage-id]
I'm not sure if its possible for Hive-on-Tez to create a plan with a non-root stage that contains edges, but it is possible for Hive-on-Spark (support added for HoS in HIVE-11133).
Example for HoS:
set hive.optimize.ppd=true; set hive.ppd.remove.duplicatefilters=true; set hive.spark.dynamic.partition.pruning=true; set hive.optimize.metadataonly=false; set hive.optimize.index.filter=true; set hive.strict.checks.cartesian.product=false; set hive.spark.explain.user=true; set hive.spark.dynamic.partition.pruning=true; EXPLAIN select count(*) from srcpart where srcpart.ds in (select max(srcpart.ds) from srcpart union all select min(srcpart.ds) from srcpart);
Prints
Plan optimized by CBO. Vertex dependency in root stage Reducer 10 <- Map 9 (GROUP) Reducer 11 <- Reducer 10 (GROUP), Reducer 13 (GROUP) Reducer 13 <- Map 12 (GROUP) Vertex dependency in root stage Reducer 2 <- Map 1 (PARTITION-LEVEL SORT), Reducer 6 (PARTITION-LEVEL SORT) Reducer 3 <- Reducer 2 (GROUP) Reducer 5 <- Map 4 (GROUP) Reducer 6 <- Reducer 5 (GROUP), Reducer 8 (GROUP) Reducer 8 <- Map 7 (GROUP) Stage-0 Fetch Operator limit:-1 Stage-1 Reducer 3 File Output Operator [FS_34] Group By Operator [GBY_32] (rows=1 width=8) Output:["_col0"],aggregations:["count(VALUE._col0)"] <-Reducer 2 [GROUP] GROUP [RS_31] Group By Operator [GBY_30] (rows=1 width=8) Output:["_col0"],aggregations:["count()"] Join Operator [JOIN_28] (rows=2200 width=10) condition map:[{"":"{\"type\":\"Inner\",\"left\":0,\"right\":1}"}],keys:{"0":"_col0","1":"_col0"} <-Map 1 [PARTITION-LEVEL SORT] PARTITION-LEVEL SORT [RS_26] PartitionCols:_col0 Select Operator [SEL_2] (rows=2000 width=10) Output:["_col0"] TableScan [TS_0] (rows=2000 width=10) default@srcpart,srcpart,Tbl:COMPLETE,Col:NONE <-Reducer 6 [PARTITION-LEVEL SORT] PARTITION-LEVEL SORT [RS_27] PartitionCols:_col0 Group By Operator [GBY_24] (rows=1 width=184) Output:["_col0"],keys:KEY._col0 <-Reducer 5 [GROUP] GROUP [RS_23] PartitionCols:_col0 Group By Operator [GBY_22] (rows=2 width=184) Output:["_col0"],keys:_col0 Filter Operator [FIL_9] (rows=1 width=184) predicate:_col0 is not null Group By Operator [GBY_7] (rows=1 width=184) Output:["_col0"],aggregations:["max(VALUE._col0)"] <-Map 4 [GROUP] GROUP [RS_6] Group By Operator [GBY_5] (rows=1 width=184) Output:["_col0"],aggregations:["max(ds)"] Select Operator [SEL_4] (rows=2000 width=10) Output:["ds"] TableScan [TS_3] (rows=2000 width=10) default@srcpart,srcpart,Tbl:COMPLETE,Col:NONE <-Reducer 8 [GROUP] GROUP [RS_23] PartitionCols:_col0 Group By Operator [GBY_22] (rows=2 width=184) Output:["_col0"],keys:_col0 Filter Operator [FIL_17] (rows=1 width=184) predicate:_col0 is not null Group By Operator [GBY_15] (rows=1 width=184) Output:["_col0"],aggregations:["min(VALUE._col0)"] <-Map 7 [GROUP] GROUP [RS_14] Group By Operator [GBY_13] (rows=1 width=184) Output:["_col0"],aggregations:["min(ds)"] Select Operator [SEL_12] (rows=2000 width=10) Output:["ds"] TableScan [TS_11] (rows=2000 width=10) default@srcpart,srcpart,Tbl:COMPLETE,Col:NONE Stage-2 Reducer 11
So there are two sections that say Vertex dependency in root stage.
Attachments
Attachments
Issue Links
- is related to
-
HIVE-11133 Support hive.explain.user for Spark
- Closed
- links to