[HIVE-15581] Unable to use advanced aggregation with multiple inserts clause - ASF JIRA

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Won't Fix
Affects Version/s: 1.2.1
Fix Version/s: None
Component/s: Hive
Labels:
- newbie

Description

■Use Cases

Use multiple insert clauses within a single query to insert multiple static (user-defined) partitions into a single table.
Use advanced aggregation (cube) features within each insert clause to include subtotals of columns for each partition

■Expected Behaviour

Subtotals are inserted for all combinations of the set of columns

■Observed Behaviour

No subtotals are not inserted for any combination of the set of columns

■Sample Queries

// Create test tables
create table if not exists
	table1
	(
		column1 string,
		column2 string,
		column3 int
		)
	stored as orc
	tblproperties
	(
		"orc.compress" = "SNAPPY"
		);

create table if not exists
	table2
	(
		column1 string,
		column2 string,
		column3 int
		)
	partitioned by
	(
		partition1 string
		)
	stored as orc
	tblproperties
	(
		"orc.compress" = "SNAPPY"
		);

create table if not exists
	table3
	(
		column1 string,
		column2 string,
		column3 int
		)
	partitioned by
	(
		partition1 string
		)
	stored as orc
	tblproperties
	(
		"orc.compress" = "SNAPPY"
		);

// Insert test values
insert overwrite table
	table1
	values
		('value1', 'value1', 1),
		('value2', 'value2', 1),
		('value3', 'value3', 1);

// Single insert clause with multiple inserts syntax
// Subtotals are inserted into target table
from
	table1
insert overwrite table
	table2
	partition
	(
		partition1 = 'value1'
		)
	select
		column1,
		column2,
		sum(column3) as column3
	group by
		column1,
		column2
	with cube;

// Multiple insert clauses with multiple inserts syntax
// Subtotals are not inserted into target table
from
	table1
insert overwrite table
	table3
	partition
	(
		partition1 = 'value1'
		)
	select
		column1,
		column2,
		sum(column3) as column3
	group by
		column1,
		column2
	with cube
insert overwrite table
	table3
	partition
	(
		partition1 = 'value2'
		)
	select
		column1,
		column2,
		sum(column3) as column3
	group by
		column1,
		column2
	with cube;

■Executions Plans

Single insert clause with multiple inserts syntax

STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1
  Stage-2 depends on stages: Stage-0

STAGE PLANS:
  Stage: Stage-1
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: table1
            Statistics: Num rows: 3 Data size: 552 Basic stats: COMPLETE Column stats: NONE
            Select Operator
              expressions: column1 (type: string), column2 (type: string), column3 (type: int)
              outputColumnNames: column1, column2, column3
              Statistics: Num rows: 3 Data size: 552 Basic stats: COMPLETE Column stats: NONE
              Group By Operator
                aggregations: sum(column3)
                keys: column1 (type: string), column2 (type: string), '0' (type: string)
                mode: hash
                outputColumnNames: _col0, _col1, _col2, _col3
                Statistics: Num rows: 12 Data size: 2208 Basic stats: COMPLETE Column stats: NONE
                Reduce Output Operator
                  key expressions: _col0 (type: string), _col1 (type: string), _col2 (type: string)
                  sort order: +++
                  Map-reduce partition columns: _col0 (type: string), _col1 (type: string), _col2 (type: string)
                  Statistics: Num rows: 12 Data size: 2208 Basic stats: COMPLETE Column stats: NONE
                  value expressions: _col3 (type: bigint)
      Reduce Operator Tree:
        Group By Operator
          aggregations: sum(VALUE._col0)
          keys: KEY._col0 (type: string), KEY._col1 (type: string), KEY._col2 (type: string)
          mode: mergepartial
          outputColumnNames: _col0, _col1, _col3
          Statistics: Num rows: 6 Data size: 1104 Basic stats: COMPLETE Column stats: NONE
          pruneGroupingSetId: true
          Select Operator
            expressions: _col0 (type: string), _col1 (type: string), UDFToInteger(_col3) (type: int)
            outputColumnNames: _col0, _col1, _col2
            Statistics: Num rows: 6 Data size: 1104 Basic stats: COMPLETE Column stats: NONE
            File Output Operator
              compressed: false
              Statistics: Num rows: 6 Data size: 1104 Basic stats: COMPLETE Column stats: NONE
              table:
                  input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
                  output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
                  serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
                  name: zzz_james_ball.table2

  Stage: Stage-0
    Move Operator
      tables:
          partition:
            partition1 value1
          replace: true
          table:
              input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
              output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
              serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
              name: zzz_james_ball.table2

  Stage: Stage-2
    Stats-Aggr Operator

Single insert clause with multiple inserts syntax

STAGE DEPENDENCIES:
  Stage-2 is a root stage
  Stage-0 depends on stages: Stage-2
  Stage-3 depends on stages: Stage-0
  Stage-1 depends on stages: Stage-2
  Stage-4 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-2
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: table1
            Statistics: Num rows: 3 Data size: 552 Basic stats: COMPLETE Column stats: NONE
            Select Operator
              expressions: column1 (type: string), column2 (type: string), column3 (type: int)
              outputColumnNames: column1, column2, column3
              Statistics: Num rows: 3 Data size: 552 Basic stats: COMPLETE Column stats: NONE
              Reduce Output Operator
                key expressions: column1 (type: string), column2 (type: string)
                sort order: ++
                Map-reduce partition columns: column1 (type: string), column2 (type: string)
                Statistics: Num rows: 3 Data size: 552 Basic stats: COMPLETE Column stats: NONE
                value expressions: column3 (type: int)
      Reduce Operator Tree:
        Forward
          Statistics: Num rows: 3 Data size: 552 Basic stats: COMPLETE Column stats: NONE
          Group By Operator
            aggregations: sum(VALUE._col0)
            keys: KEY._col0 (type: string), KEY._col1 (type: string)
            mode: complete
            outputColumnNames: _col0, _col1, _col2
            Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE Column stats: NONE
            Select Operator
              expressions: _col0 (type: string), _col1 (type: string), UDFToInteger(_col2) (type: int)
              outputColumnNames: _col0, _col1, _col2
              Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE Column stats: NONE
              File Output Operator
                compressed: false
                Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE Column stats: NONE
                table:
                    input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
                    output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
                    serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
                    name: zzz_james_ball.table3
          Group By Operator
            aggregations: sum(VALUE._col0)
            keys: KEY._col0 (type: string), KEY._col1 (type: string)
            mode: complete
            outputColumnNames: _col0, _col1, _col2
            Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE Column stats: NONE
            Select Operator
              expressions: _col0 (type: string), _col1 (type: string), UDFToInteger(_col2) (type: int)
              outputColumnNames: _col0, _col1, _col2
              Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE Column stats: NONE
              File Output Operator
                compressed: false
                Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE Column stats: NONE
                table:
                    input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
                    output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
                    serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
                    name: zzz_james_ball.table3

  Stage: Stage-0
    Move Operator
      tables:
          partition:
            partition1 value1
          replace: true
          table:
              input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
              output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
              serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
              name: zzz_james_ball.table3

  Stage: Stage-3
    Stats-Aggr Operator

  Stage: Stage-1
    Move Operator
      tables:
          partition:
            partition1 value2
          replace: true
          table:
              input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
              output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
              serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
              name: zzz_james_ball.table3

  Stage: Stage-4
    Stats-Aggr Operator

■Notes

This problem occurs with all advanced aggregation features (cube, grouping sets, rollup)
This problem occurs whether hive.map.aggr is set to true or false
Dynamic partitions are not used because the partition values are set manually within the where conditions of each insert clause

Unable to use advanced aggregation with multiple inserts clause

Details

Description

Attachments

Activity

People

Dates