Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
2.3.8, 3.1.3, 4.0.0-beta-1
Description
Hive doesn't fill in the statistics of materialized CTEs, and the size of those TableScans is underestimated. That causes Tez to run with fewer tasks or to fail with OOM because MapJoin could be wrongly applied.
The following example shows Map 1 reading `src` generates 493 rows, but Map 3 reading `cte` is expected to scan only 1 row.
0: jdbc:hive2://hive-hiveserver2:10000/defaul> EXPLAIN WITH cte AS ( . . . . . . . . . . . . . . . . . . . . . . .> SELECT * FROM src . . . . . . . . . . . . . . . . . . . . . . .> ) . . . . . . . . . . . . . . . . . . . . . . .> SELECT * . . . . . . . . . . . . . . . . . . . . . . .> FROM cte a . . . . . . . . . . . . . . . . . . . . . . .> JOIN cte b ON (a.key = b.key) . . . . . . . . . . . . . . . . . . . . . . .> JOIN cte c ON (a.key = c.key); ... +----------------------------------------------------+ | Explain | +----------------------------------------------------+ | Plan optimized by CBO. | | | | Vertex dependency in Stage-4 | | Map 2 <- Map 3 (BROADCAST_EDGE), Reducer 4 (BROADCAST_EDGE) | | Reducer 4 <- Map 3 (SIMPLE_EDGE) | | | | Stage-3 | | Fetch Operator | | limit:-1 | | Stage-4 | | Map 2 vectorized | | File Output Operator [FS_69] | | Map Join Operator [MAPJOIN_68] (rows=1 width=444) | | Conds:MAPJOIN_67._col0=RS_61._col0(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5"] | | <-Map 3 [BROADCAST_EDGE] vectorized | | SHUFFLE [RS_61] | | PartitionCols:_col0 | | Select Operator [SEL_60] (rows=1 width=368) | | Output:["_col0","_col1"] | | Filter Operator [FIL_59] (rows=1 width=368) | | predicate:key is not null | | TableScan [TS_11] (rows=1 width=368) | | default@cte,c,Tbl:COMPLETE,Col:NONE,Output:["key","value"] | | <-Map Join Operator [MAPJOIN_67] (rows=1 width=404) | | Conds:SEL_66._col0=RS_64._col0(Inner),Output:["_col0","_col1","_col2","_col3"] | | <-Reducer 4 [BROADCAST_EDGE] vectorized | | BROADCAST [RS_64] | | PartitionCols:_col0 | | Select Operator [SEL_63] | | Output:["_col0","_col1"] | | <-Select Operator [SEL_66] (rows=1 width=368) | | Output:["_col0","_col1"] | | Filter Operator [FIL_65] (rows=1 width=368) | | predicate:key is not null | | TableScan [TS_5] (rows=1 width=368) | | default@cte,a,Tbl:COMPLETE,Col:NONE,Output:["key","value"] | | Stage-2 | | Dependency Collection{} | | Stage-1 | | Map 1 vectorized | | File Output Operator [FS_4] | | table:{"name:":"default.cte"} | | Select Operator [SEL_3] (rows=493 width=350) | | Output:["_col0","_col1"] | | TableScan [TS_0] (rows=493 width=350) | | default@src,src,Tbl:COMPLETE,Col:NONE,Output:["key","value"] | | Stage-0 | | Move Operator | | Please refer to the previous Stage-1 | | | +----------------------------------------------------+
Attachments
Issue Links
- causes
-
HIVE-28098 Fails to copy empty column statistics of materialized CTE
- Resolved
- links to