[HIVE-18226] handle UDF to double/int over aggregate - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: None
Component/s: Druid integration
Labels:
None

Description

In cases like the following query Hive planner adds extra UDFtoDouble over integer columns.
This kind of udf can be pushed to Druid as DoubleSum instead of LongSum and vice versa.

PREHOOK: query: EXPLAIN SELECT floor_year(`__time`), SUM(ctinyint)/ count(*)
FROM druid_table GROUP BY floor_year(`__time`)
PREHOOK: type: QUERY
POSTHOOK: query: EXPLAIN SELECT floor_year(`__time`), SUM(ctinyint)/ count(*)
FROM druid_table GROUP BY floor_year(`__time`)
POSTHOOK: type: QUERY
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: druid_table
            properties:
              druid.query.json {"queryType":"timeseries","dataSource":"default.druid_table","descending":false,"granularity":"year","aggregations":[{"type":"longSum","name":"$f1","fieldName":"ctinyint"},{"type":"count","name":"$f2"}],"intervals":["1900-01-01T00:00:00.000/3000-01-01T00:00:00.000"],"context":{"skipEmptyBuckets":true}}
              druid.query.type timeseries
            Statistics: Num rows: 9173 Data size: 0 Basic stats: PARTIAL Column stats: NONE
            Select Operator
              expressions: __time (type: timestamp with local time zone), (UDFToDouble($f1) / UDFToDouble($f2)) (type: double)
              outputColumnNames: _col0, _col1
              Statistics: Num rows: 9173 Data size: 0 Basic stats: PARTIAL Column stats: NONE
              File Output Operator
                compressed: false
                Statistics: Num rows: 9173 Data size: 0 Basic stats: PARTIAL Column stats: NONE
                table:
                    input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                    output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                    serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink

Attachments

Issue Links

Is contained by

CALCITE-2170 Use Druid Expressions capabilities to improve the amount of work that can be pushed to Druid

Closed

HIVE-18957 Upgrade Calcite version to 1.16.0

Closed

Activity

People

Assignee:: Slim Bouguerra

Reporter:: Slim Bouguerra

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 05/Dec/17 22:29

Updated:: 28/Mar/18 16:16

Resolved:: 28/Mar/18 16:16