Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Duplicate
-
4.0.0
-
None
-
None
Description
Reproducer
set hive.optimize.sort.dynamic.partition=true; set hive.exec.dynamic.partition.mode=nonstrict; set hive.stats.autogather=true; create table t11(i int, j int) partitioned by (s string); insert into t11 partition(s) values(3,4, 'p1'),(4,5, 'p2'),(6,9,'p3'); hive> desc formatted t11 j; OK col_name j data_type int min max num_nulls distinct_count avg_col_len max_col_len num_trues num_falses bitVector comment from deserializer COLUMN_STATS_ACCURATE {}
hive> explain insert into t11 partition(s) values(3,4, 'p1'),(4,5, 'p2'),(6,9,'p3'); STAGE PLANS: Stage: Stage-1 Tez DagId: vgarg_20181016113701_f3aa9f8f-b38b-47a8-8149-b5521bf072f6:13 Edges: Reducer 2 <- Map 1 (SIMPLE_EDGE) DagName: vgarg_20181016113701_f3aa9f8f-b38b-47a8-8149-b5521bf072f6:13 Vertices: Map 1 Map Operator Tree: TableScan alias: _dummy_table Row Limit Per Split: 1 Statistics: Num rows: 1 Data size: 10 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: array(const struct(3,4,'p1'),const struct(4,5,'p2'),const struct(6,9,'p3')) (type: array<struct<col1:int,col2:int,col3:string>>) outputColumnNames: _col0 Statistics: Num rows: 1 Data size: 64 Basic stats: COMPLETE Column stats: COMPLETE UDTF Operator Statistics: Num rows: 1 Data size: 64 Basic stats: COMPLETE Column stats: COMPLETE function name: inline Select Operator expressions: col1 (type: int), col2 (type: int), col3 (type: string) outputColumnNames: _col0, _col1, _col2 Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: _col2 (type: string) sort order: + Map-reduce partition columns: _col2 (type: string) Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE value expressions: _col0 (type: int), _col1 (type: int) Reducer 2 Execution mode: vectorized Reduce Operator Tree: Select Operator expressions: VALUE._col0 (type: int), VALUE._col1 (type: int), KEY._col2 (type: string) outputColumnNames: _col0, _col1, _col2 Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE File Output Operator compressed: false Dp Sort State: PARTITION_SORTED Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe name: default.t11 Stage: Stage-2 Dependency Collection Stage: Stage-0 Move Operator tables: partition: s replace: false table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe name: default.t11 Stage: Stage-3 Stats Work Basic Stats Work: Column Stats Desc: Columns: i, j Column Types: int, int Table: default.t11
Notice that explain plan has autogather stats branch missing
Attachments
Issue Links
- is duplicated by
-
HIVE-16100 Dynamic Sorted Partition optimizer loses sibling operators
-
- Closed
-