Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-15516

Unable to vectorize select statement having case-when with GenericUDFOPGreaterThan expr

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • None
    • None
    • None
    • None

    Description

      First query listed below does not get vectorized; Without "case-when" statement it gets vectorized.

      hive> explain select sum(case when ss_quantity > 1 then ss_quantity * ss_wholesale_cost else 0 end) from store_sales;
      explain select sum(case when ss_quantity > 1 then ss_quantity * ss_wholesale_cost else 0 end) from store_sales
      OK
      STAGE DEPENDENCIES:
        Stage-1 is a root stage
        Stage-0 depends on stages: Stage-1
      
      STAGE PLANS:
        Stage: Stage-1
          Tez
            DagId: rbalamohan_20161227045137_c7a736c6-1812-4c8f-974e-7f7fcc7b1513:28
            Edges:
              Reducer 2 <- Map 1 (SIMPLE_EDGE)
            DagName:
            Vertices:
              Map 1
                  Map Operator Tree:
                      TableScan
                        alias: store_sales
                        Statistics: Num rows: 28800426268 Data size: 330048503520 Basic stats: COMPLETE Column stats: COMPLETE
                        Select Operator
                          expressions: CASE WHEN ((ss_quantity > 1)) THEN ((UDFToDouble(ss_quantity) * ss_wholesale_cost)) ELSE (0) END (type: double)
                          outputColumnNames: _col0
                          Statistics: Num rows: 28800426268 Data size: 330048503520 Basic stats: COMPLETE Column stats: COMPLETE
                          Group By Operator
                            aggregations: sum(_col0)
                            mode: hash
                            outputColumnNames: _col0
                            Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
                            Reduce Output Operator
                              sort order:
                              Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
                              value expressions: _col0 (type: double)
                  Execution mode: llap
                  LLAP IO: all inputs
              Reducer 2
                  Execution mode: vectorized, llap
                  Reduce Operator Tree:
                    Group By Operator
                      aggregations: sum(VALUE._col0)
                      mode: mergepartial
                      outputColumnNames: _col0
                      Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
                      File Output Operator
                        compressed: false
                        Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
                        table:
                            input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                            output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                            serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
      
        Stage: Stage-0
          Fetch Operator
            limit: -1
            Processor Tree:
              ListSink
      
      
      ....
      ....
      2016-12-27T04:53:20,507  INFO [16185d97-97f4-477e-9436-4d2b98add389 main] physical.Vectorizer: MapWork Operator: SEL could not be vectorized.
      2016-12-27T04:53:20,507  INFO [16185d97-97f4-477e-9436-4d2b98add389 main] physical.Vectorizer: Unable to use the VectorUDFAdaptor. Encountered unsupported expr desc : GenericUDFOPGreaterThan(Column[ss_quantity], Const int 1)
      2016-12-27T04:53:20,507  INFO [16185d97-97f4-477e-9436-4d2b98add389 main] physical.Vectorizer: Cannot vectorize select expression: GenericUDFWhen(GenericUDFOPGreaterThan(Column[ss_quantity], Const int 1), GenericUDFOPMultiply(GenericUDFBridge ==> UDFToDouble (Column[ss_quantity]), Column[ss_wholesale_cost]), Const int 0)
      2016-12-27T04:53:20,507  INFO [16185d97-97f4-477e-9436-4d2b98add389 main] physical.Vectorizer: MapWork Operator: SEL could not be vectorized.
      ....
      ....
      
      
      hive> explain select sum(ss_quantity * ss_wholesale_cost) from store_sales;
      explain select sum(ss_quantity * ss_wholesale_cost) from store_sales
      OK
      STAGE DEPENDENCIES:
        Stage-1 is a root stage
        Stage-0 depends on stages: Stage-1
      
      STAGE PLANS:
        Stage: Stage-1
          Tez
            DagId: rbalamohan_20161227045112_8311df89-31fb-47ee-ad70-f702a85527cc:27
            Edges:
              Reducer 2 <- Map 1 (SIMPLE_EDGE)
            DagName:
            Vertices:
              Map 1
                  Map Operator Tree:
                      TableScan
                        alias: store_sales
                        Statistics: Num rows: 28800426268 Data size: 330048503520 Basic stats: COMPLETE Column stats: COMPLETE
                        Select Operator
                          expressions: (UDFToDouble(ss_quantity) * ss_wholesale_cost) (type: double)
                          outputColumnNames: _col0
                          Statistics: Num rows: 28800426268 Data size: 330048503520 Basic stats: COMPLETE Column stats: COMPLETE
                          Group By Operator
                            aggregations: sum(_col0)
                            mode: hash
                            outputColumnNames: _col0
                            Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
                            Reduce Output Operator
                              sort order:
                              Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
                              value expressions: _col0 (type: double)
                  Execution mode: vectorized, llap
                  LLAP IO: all inputs
              Reducer 2
                  Execution mode: vectorized, llap
                  Reduce Operator Tree:
                    Group By Operator
                      aggregations: sum(VALUE._col0)
                      mode: mergepartial
                      outputColumnNames: _col0
                      Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
                      File Output Operator
                        compressed: false
                        Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
                        table:
                            input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                            output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                            serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
      
        Stage: Stage-0
          Fetch Operator
            limit: -1
            Processor Tree:
              ListSink
      
      

      Attachments

        Activity

          People

            Unassigned Unassigned
            rajesh.balamohan Rajesh Balamohan
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: