Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-18174

Vectorization: De-dup Group-by key expressions (identical keys are irrelevant)

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • 3.0.0
    • None
    • Vectorization
    • None

    Description

      hive.vectorized.execution.reduce.enabled=true;
      hive.vectorized.execution.reduce.groupby.enabled=true;
      create temporary table foo (x int) stored as orc;
      insert into foo values(1),(2),(3);
      insert into foo values(1),(2),(3);
      set hive.cbo.enable=false;
      select distinct concat('x', x) x, concat('x', x), 'Foo', 'Foo' from foo;
      
      Caused by: java.lang.RuntimeException: null STRING entry: batchIndex 0
              at org.apache.hadoop.hive.ql.exec.vector.VectorExtractRow.nullBytesReadError(VectorExtractRow.java:476)
              at org.apache.hadoop.hive.ql.exec.vector.VectorExtractRow.extractRowColumn(VectorExtractRow.java:288)
      

      The key has duplicate references - keys: KEY._col0 (type: string), KEY._col0 (type: string), 'Foo' (type: string), 'Foo' (type: string)

      STAGE PLANS:
        Stage: Stage-1
          Tez
            DagId: gopal_20171128220857_9c9def2e-d0a4-461a-8fd6-f9fdaea2d5ce:26
            Edges:
              Reducer 2 <- Map 1 (SIMPLE_EDGE)
            DagName: 
            Vertices:
              Map 1 
                  Map Operator Tree:
                      TableScan
                        alias: foo
                        Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE
                        Select Operator
                          expressions: x (type: int)
                          outputColumnNames: x
                          Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE
                          Group By Operator
                            keys: concat('x', x) (type: string), concat('x', x) (type: string), 'Foo' (type: string), 'Foo' (type: string)
                            mode: hash
                            outputColumnNames: _col0, _col1, _col2, _col3
                            Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE
                            Reduce Output Operator
                              key expressions: _col1 (type: string), 'Foo' (type: string)
                              sort order: ++
                              Map-reduce partition columns: _col1 (type: string), 'Foo' (type: string)
                              Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE
                  Execution mode: vectorized, llap
                  LLAP IO: all inputs
              Reducer 2 
                  Execution mode: vectorized, llap
                  Reduce Operator Tree:
                    Group By Operator
                      keys: KEY._col0 (type: string), KEY._col0 (type: string), 'Foo' (type: string), 'Foo' (type: string)
                      mode: mergepartial
                      outputColumnNames: _col0, _col1, _col2, _col3
                      Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE
                      Select Operator
                        expressions: _col1 (type: string), _col1 (type: string), 'Foo' (type: string), 'Foo' (type: string)
                        outputColumnNames: _col0, _col1, _col2, _col3
                        Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE
                        File Output Operator
                          compressed: false
                          Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE
                          table:
                              input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                              output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                              serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
      

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              gopalv Gopal Vijayaraghavan
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: