Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Duplicate
-
None
-
None
-
None
-
None
Description
First query listed below does not get vectorized; Without "case-when" statement it gets vectorized.
hive> explain select sum(case when ss_quantity > 1 then ss_quantity * ss_wholesale_cost else 0 end) from store_sales; explain select sum(case when ss_quantity > 1 then ss_quantity * ss_wholesale_cost else 0 end) from store_sales OK STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-1 Tez DagId: rbalamohan_20161227045137_c7a736c6-1812-4c8f-974e-7f7fcc7b1513:28 Edges: Reducer 2 <- Map 1 (SIMPLE_EDGE) DagName: Vertices: Map 1 Map Operator Tree: TableScan alias: store_sales Statistics: Num rows: 28800426268 Data size: 330048503520 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: CASE WHEN ((ss_quantity > 1)) THEN ((UDFToDouble(ss_quantity) * ss_wholesale_cost)) ELSE (0) END (type: double) outputColumnNames: _col0 Statistics: Num rows: 28800426268 Data size: 330048503520 Basic stats: COMPLETE Column stats: COMPLETE Group By Operator aggregations: sum(_col0) mode: hash outputColumnNames: _col0 Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator sort order: Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE value expressions: _col0 (type: double) Execution mode: llap LLAP IO: all inputs Reducer 2 Execution mode: vectorized, llap Reduce Operator Tree: Group By Operator aggregations: sum(VALUE._col0) mode: mergepartial outputColumnNames: _col0 Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE File Output Operator compressed: false Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE table: input format: org.apache.hadoop.mapred.SequenceFileInputFormat output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe Stage: Stage-0 Fetch Operator limit: -1 Processor Tree: ListSink .... .... 2016-12-27T04:53:20,507 INFO [16185d97-97f4-477e-9436-4d2b98add389 main] physical.Vectorizer: MapWork Operator: SEL could not be vectorized. 2016-12-27T04:53:20,507 INFO [16185d97-97f4-477e-9436-4d2b98add389 main] physical.Vectorizer: Unable to use the VectorUDFAdaptor. Encountered unsupported expr desc : GenericUDFOPGreaterThan(Column[ss_quantity], Const int 1) 2016-12-27T04:53:20,507 INFO [16185d97-97f4-477e-9436-4d2b98add389 main] physical.Vectorizer: Cannot vectorize select expression: GenericUDFWhen(GenericUDFOPGreaterThan(Column[ss_quantity], Const int 1), GenericUDFOPMultiply(GenericUDFBridge ==> UDFToDouble (Column[ss_quantity]), Column[ss_wholesale_cost]), Const int 0) 2016-12-27T04:53:20,507 INFO [16185d97-97f4-477e-9436-4d2b98add389 main] physical.Vectorizer: MapWork Operator: SEL could not be vectorized. .... .... hive> explain select sum(ss_quantity * ss_wholesale_cost) from store_sales; explain select sum(ss_quantity * ss_wholesale_cost) from store_sales OK STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-1 Tez DagId: rbalamohan_20161227045112_8311df89-31fb-47ee-ad70-f702a85527cc:27 Edges: Reducer 2 <- Map 1 (SIMPLE_EDGE) DagName: Vertices: Map 1 Map Operator Tree: TableScan alias: store_sales Statistics: Num rows: 28800426268 Data size: 330048503520 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: (UDFToDouble(ss_quantity) * ss_wholesale_cost) (type: double) outputColumnNames: _col0 Statistics: Num rows: 28800426268 Data size: 330048503520 Basic stats: COMPLETE Column stats: COMPLETE Group By Operator aggregations: sum(_col0) mode: hash outputColumnNames: _col0 Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator sort order: Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE value expressions: _col0 (type: double) Execution mode: vectorized, llap LLAP IO: all inputs Reducer 2 Execution mode: vectorized, llap Reduce Operator Tree: Group By Operator aggregations: sum(VALUE._col0) mode: mergepartial outputColumnNames: _col0 Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE File Output Operator compressed: false Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE table: input format: org.apache.hadoop.mapred.SequenceFileInputFormat output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe Stage: Stage-0 Fetch Operator limit: -1 Processor Tree: ListSink