Description
■Use Cases
- Use multiple insert clauses within a single query to insert multiple static (user-defined) partitions into a single table.
- Use advanced aggregation (cube) features within each insert clause to include subtotals of columns for each partition
■Expected Behaviour
- Subtotals are inserted for all combinations of the set of columns
■Observed Behaviour
- No subtotals are not inserted for any combination of the set of columns
■Sample Queries
// Create test tables create table if not exists table1 ( column1 string, column2 string, column3 int ) stored as orc tblproperties ( "orc.compress" = "SNAPPY" ); create table if not exists table2 ( column1 string, column2 string, column3 int ) partitioned by ( partition1 string ) stored as orc tblproperties ( "orc.compress" = "SNAPPY" ); create table if not exists table3 ( column1 string, column2 string, column3 int ) partitioned by ( partition1 string ) stored as orc tblproperties ( "orc.compress" = "SNAPPY" );
// Insert test values insert overwrite table table1 values ('value1', 'value1', 1), ('value2', 'value2', 1), ('value3', 'value3', 1);
// Single insert clause with multiple inserts syntax // Subtotals are inserted into target table from table1 insert overwrite table table2 partition ( partition1 = 'value1' ) select column1, column2, sum(column3) as column3 group by column1, column2 with cube;
// Multiple insert clauses with multiple inserts syntax // Subtotals are not inserted into target table from table1 insert overwrite table table3 partition ( partition1 = 'value1' ) select column1, column2, sum(column3) as column3 group by column1, column2 with cube insert overwrite table table3 partition ( partition1 = 'value2' ) select column1, column2, sum(column3) as column3 group by column1, column2 with cube;
■Executions Plans
- Single insert clause with multiple inserts syntax
STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 Stage-2 depends on stages: Stage-0 STAGE PLANS: Stage: Stage-1 Map Reduce Map Operator Tree: TableScan alias: table1 Statistics: Num rows: 3 Data size: 552 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: column1 (type: string), column2 (type: string), column3 (type: int) outputColumnNames: column1, column2, column3 Statistics: Num rows: 3 Data size: 552 Basic stats: COMPLETE Column stats: NONE Group By Operator aggregations: sum(column3) keys: column1 (type: string), column2 (type: string), '0' (type: string) mode: hash outputColumnNames: _col0, _col1, _col2, _col3 Statistics: Num rows: 12 Data size: 2208 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: _col0 (type: string), _col1 (type: string), _col2 (type: string) sort order: +++ Map-reduce partition columns: _col0 (type: string), _col1 (type: string), _col2 (type: string) Statistics: Num rows: 12 Data size: 2208 Basic stats: COMPLETE Column stats: NONE value expressions: _col3 (type: bigint) Reduce Operator Tree: Group By Operator aggregations: sum(VALUE._col0) keys: KEY._col0 (type: string), KEY._col1 (type: string), KEY._col2 (type: string) mode: mergepartial outputColumnNames: _col0, _col1, _col3 Statistics: Num rows: 6 Data size: 1104 Basic stats: COMPLETE Column stats: NONE pruneGroupingSetId: true Select Operator expressions: _col0 (type: string), _col1 (type: string), UDFToInteger(_col3) (type: int) outputColumnNames: _col0, _col1, _col2 Statistics: Num rows: 6 Data size: 1104 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false Statistics: Num rows: 6 Data size: 1104 Basic stats: COMPLETE Column stats: NONE table: input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde name: zzz_james_ball.table2 Stage: Stage-0 Move Operator tables: partition: partition1 value1 replace: true table: input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde name: zzz_james_ball.table2 Stage: Stage-2 Stats-Aggr Operator
- Single insert clause with multiple inserts syntax
STAGE DEPENDENCIES: Stage-2 is a root stage Stage-0 depends on stages: Stage-2 Stage-3 depends on stages: Stage-0 Stage-1 depends on stages: Stage-2 Stage-4 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-2 Map Reduce Map Operator Tree: TableScan alias: table1 Statistics: Num rows: 3 Data size: 552 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: column1 (type: string), column2 (type: string), column3 (type: int) outputColumnNames: column1, column2, column3 Statistics: Num rows: 3 Data size: 552 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: column1 (type: string), column2 (type: string) sort order: ++ Map-reduce partition columns: column1 (type: string), column2 (type: string) Statistics: Num rows: 3 Data size: 552 Basic stats: COMPLETE Column stats: NONE value expressions: column3 (type: int) Reduce Operator Tree: Forward Statistics: Num rows: 3 Data size: 552 Basic stats: COMPLETE Column stats: NONE Group By Operator aggregations: sum(VALUE._col0) keys: KEY._col0 (type: string), KEY._col1 (type: string) mode: complete outputColumnNames: _col0, _col1, _col2 Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: _col0 (type: string), _col1 (type: string), UDFToInteger(_col2) (type: int) outputColumnNames: _col0, _col1, _col2 Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE Column stats: NONE table: input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde name: zzz_james_ball.table3 Group By Operator aggregations: sum(VALUE._col0) keys: KEY._col0 (type: string), KEY._col1 (type: string) mode: complete outputColumnNames: _col0, _col1, _col2 Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: _col0 (type: string), _col1 (type: string), UDFToInteger(_col2) (type: int) outputColumnNames: _col0, _col1, _col2 Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE Column stats: NONE table: input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde name: zzz_james_ball.table3 Stage: Stage-0 Move Operator tables: partition: partition1 value1 replace: true table: input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde name: zzz_james_ball.table3 Stage: Stage-3 Stats-Aggr Operator Stage: Stage-1 Move Operator tables: partition: partition1 value2 replace: true table: input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde name: zzz_james_ball.table3 Stage: Stage-4 Stats-Aggr Operator
■Notes
- This problem occurs with all advanced aggregation features (cube, grouping sets, rollup)
- This problem occurs whether hive.map.aggr is set to true or false
- Dynamic partitions are not used because the partition values are set manually within the where conditions of each insert clause