Details
-
Sub-task
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
In cases like the following query Hive planner adds extra UDFtoDouble over integer columns.
This kind of udf can be pushed to Druid as DoubleSum instead of LongSum and vice versa.
PREHOOK: query: EXPLAIN SELECT floor_year(`__time`), SUM(ctinyint)/ count(*) FROM druid_table GROUP BY floor_year(`__time`) PREHOOK: type: QUERY POSTHOOK: query: EXPLAIN SELECT floor_year(`__time`), SUM(ctinyint)/ count(*) FROM druid_table GROUP BY floor_year(`__time`) POSTHOOK: type: QUERY STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-1 Map Reduce Map Operator Tree: TableScan alias: druid_table properties: druid.query.json {"queryType":"timeseries","dataSource":"default.druid_table","descending":false,"granularity":"year","aggregations":[{"type":"longSum","name":"$f1","fieldName":"ctinyint"},{"type":"count","name":"$f2"}],"intervals":["1900-01-01T00:00:00.000/3000-01-01T00:00:00.000"],"context":{"skipEmptyBuckets":true}} druid.query.type timeseries Statistics: Num rows: 9173 Data size: 0 Basic stats: PARTIAL Column stats: NONE Select Operator expressions: __time (type: timestamp with local time zone), (UDFToDouble($f1) / UDFToDouble($f2)) (type: double) outputColumnNames: _col0, _col1 Statistics: Num rows: 9173 Data size: 0 Basic stats: PARTIAL Column stats: NONE File Output Operator compressed: false Statistics: Num rows: 9173 Data size: 0 Basic stats: PARTIAL Column stats: NONE table: input format: org.apache.hadoop.mapred.SequenceFileInputFormat output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe Stage: Stage-0 Fetch Operator limit: -1 Processor Tree: ListSink
Attachments
Issue Links
- Is contained by
-
CALCITE-2170 Use Druid Expressions capabilities to improve the amount of work that can be pushed to Druid
- Closed
-
HIVE-18957 Upgrade Calcite version to 1.16.0
- Closed