Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-13872

Vectorization: Fix cross-product reduce sink serialization

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 2.1.0
    • 2.2.0
    • Vectorization
    • None

    Description

      TPC-DS Q13 produces a cross-product without CBO simplifying the query

      Caused by: java.lang.RuntimeException: null STRING entry: batchIndex 0 projection column num 1
              at org.apache.hadoop.hive.ql.exec.vector.VectorExtractRow.nullBytesReadError(VectorExtractRow.java:349)
              at org.apache.hadoop.hive.ql.exec.vector.VectorExtractRow.extractRowColumn(VectorExtractRow.java:267)
              at org.apache.hadoop.hive.ql.exec.vector.VectorExtractRow.extractRow(VectorExtractRow.java:343)
              at org.apache.hadoop.hive.ql.exec.vector.VectorReduceSinkOperator.process(VectorReduceSinkOperator.java:103)
              at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
              at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:130)
              at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:762)
              ... 18 more
      

      Simplified query

      set hive.cbo.enable=false;
      
      -- explain
      
      select count(1)  
       from store_sales
           ,customer_demographics
       where (
      ( 
        customer_demographics.cd_demo_sk = store_sales.ss_cdemo_sk
        and customer_demographics.cd_marital_status = 'M'
           )or
           (
         customer_demographics.cd_demo_sk = ss_cdemo_sk
        and customer_demographics.cd_marital_status = 'U'
           ))
      ;
      
              Map 3 
                  Map Operator Tree:
                      TableScan
                        alias: customer_demographics
                        Statistics: Num rows: 1920800 Data size: 717255532 Basic stats: COMPLETE Column stats: NONE
                        Reduce Output Operator
                          sort order: 
                          Statistics: Num rows: 1920800 Data size: 717255532 Basic stats: COMPLETE Column stats: NONE
                          value expressions: cd_demo_sk (type: int), cd_marital_status (type: string)
                  Execution mode: vectorized, llap
                  LLAP IO: all inputs
      

      Attachments

        1. customer_demographics.txt
          7 kB
          Matt McCline
        2. HIVE-13872.01.patch
          96 kB
          Matt McCline
        3. HIVE-13872.02.patch
          102 kB
          Matt McCline
        4. HIVE-13872.03.patch
          100 kB
          Matt McCline
        5. HIVE-13872.04.patch
          100 kB
          Matt McCline
        6. HIVE-13872.05.patch
          101 kB
          Matt McCline
        7. HIVE-13872.WIP.patch
          1 kB
          Gopal Vijayaraghavan
        8. vector_include_no_sel.q
          3 kB
          Matt McCline
        9. vector_include_no_sel.q.out
          16 kB
          Matt McCline

        Issue Links

          Activity

            People

              mmccline Matt McCline
              gopalv Gopal Vijayaraghavan
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: