[CALCITE-1578] Druid adapter: wrong semantics of topN query limit with granularity - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Critical
Resolution: Fixed
Affects Version/s: 1.11.0
Fix Version/s: 1.12.0
Component/s: druid-adapter
Labels:
None

Description

Semantics of Druid topN query with limit and granularity is not equivalent to input SQL. In particular, limit is applied on each granularity value, not on the overall query.

Currently, the following query will be transformed into a topN query:

SELECT i_brand_id, floor_day(`__time`), max(ss_quantity), sum(ss_wholesale_cost) as s
FROM store_sales_sold_time_subset
GROUP BY i_brand_id, floor_day(`__time`)
ORDER BY s DESC
LIMIT 10;

Previous query outputs at most 10 rows. In turn, the equivalent SQL query for a Druid topN query should be expressed as:

SELECT rs.i_brand_id, rs.d, rs.m, rs.s
FROM (
    SELECT i_brand_id, floor_day(`__time`) as d, max(ss_quantity) as m, sum(ss_wholesale_cost) as s,
           ROW_NUMBER() OVER (PARTITION BY floor_day(`__time`) ORDER BY sum(ss_wholesale_cost) DESC ) AS rownum
    FROM store_sales_sold_time_subset
    GROUP BY i_brand_id, floor_day(`__time`)
) rs
WHERE rownum <= 10;

Attachments

Issue Links

relates to

CALCITE-1591 Druid adapter: Use "groupBy" query with extractionFn for time dimension

Open

HIVE-15636 Hive/Druid integration: wrong semantics of topN query limit with granularity

Closed

Activity

People

Assignee:: Julian Hyde

Reporter:: Jesús Camacho Rodríguez

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 16/Jan/17 15:31

Updated:: 27/Feb/24 22:24

Resolved:: 20/Jan/17 00:05