[HIVE-18174] Vectorization: De-dup Group-by key expressions (identical keys are irrelevant) - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Duplicate
Affects Version/s: 3.0.0
Fix Version/s: None
Component/s: Vectorization
Labels:
None

Description

hive.vectorized.execution.reduce.enabled=true;
hive.vectorized.execution.reduce.groupby.enabled=true;
create temporary table foo (x int) stored as orc;
insert into foo values(1),(2),(3);
insert into foo values(1),(2),(3);
set hive.cbo.enable=false;
select distinct concat('x', x) x, concat('x', x), 'Foo', 'Foo' from foo;

Caused by: java.lang.RuntimeException: null STRING entry: batchIndex 0
        at org.apache.hadoop.hive.ql.exec.vector.VectorExtractRow.nullBytesReadError(VectorExtractRow.java:476)
        at org.apache.hadoop.hive.ql.exec.vector.VectorExtractRow.extractRowColumn(VectorExtractRow.java:288)

The key has duplicate references - keys: KEY._col0 (type: string), KEY._col0 (type: string), 'Foo' (type: string), 'Foo' (type: string)

STAGE PLANS:
  Stage: Stage-1
    Tez
      DagId: gopal_20171128220857_9c9def2e-d0a4-461a-8fd6-f9fdaea2d5ce:26
      Edges:
        Reducer 2 <- Map 1 (SIMPLE_EDGE)
      DagName: 
      Vertices:
        Map 1 
            Map Operator Tree:
                TableScan
                  alias: foo
                  Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE
                  Select Operator
                    expressions: x (type: int)
                    outputColumnNames: x
                    Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE
                    Group By Operator
                      keys: concat('x', x) (type: string), concat('x', x) (type: string), 'Foo' (type: string), 'Foo' (type: string)
                      mode: hash
                      outputColumnNames: _col0, _col1, _col2, _col3
                      Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE
                      Reduce Output Operator
                        key expressions: _col1 (type: string), 'Foo' (type: string)
                        sort order: ++
                        Map-reduce partition columns: _col1 (type: string), 'Foo' (type: string)
                        Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE
            Execution mode: vectorized, llap
            LLAP IO: all inputs
        Reducer 2 
            Execution mode: vectorized, llap
            Reduce Operator Tree:
              Group By Operator
                keys: KEY._col0 (type: string), KEY._col0 (type: string), 'Foo' (type: string), 'Foo' (type: string)
                mode: mergepartial
                outputColumnNames: _col0, _col1, _col2, _col3
                Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE
                Select Operator
                  expressions: _col1 (type: string), _col1 (type: string), 'Foo' (type: string), 'Foo' (type: string)
                  outputColumnNames: _col0, _col1, _col2, _col3
                  Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE
                  File Output Operator
                    compressed: false
                    Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE
                    table:
                        input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                        output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                        serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

Attachments

Issue Links

is duplicated by

HIVE-18258 Vectorization: Reduce-Side GROUP BY MERGEPARTIAL with duplicate columns is broken

Closed

Activity

People

Assignee:: Unassigned

Reporter:: Gopal Vijayaraghavan

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 29/Nov/17 03:09

Updated:: 03/Jan/18 09:59

Resolved:: 03/Jan/18 09:59