Uploaded image for project: 'Calcite'
  1. Calcite
  2. CALCITE-1578

Druid adapter: wrong semantics of topN query limit with granularity

VotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • 1.11.0
    • 1.12.0
    • druid-adapter
    • None

    Description

      Semantics of Druid topN query with limit and granularity is not equivalent to input SQL. In particular, limit is applied on each granularity value, not on the overall query.

      Currently, the following query will be transformed into a topN query:

      SELECT i_brand_id, floor_day(`__time`), max(ss_quantity), sum(ss_wholesale_cost) as s
      FROM store_sales_sold_time_subset
      GROUP BY i_brand_id, floor_day(`__time`)
      ORDER BY s DESC
      LIMIT 10;
      

      Previous query outputs at most 10 rows. In turn, the equivalent SQL query for a Druid topN query should be expressed as:

      SELECT rs.i_brand_id, rs.d, rs.m, rs.s
      FROM (
          SELECT i_brand_id, floor_day(`__time`) as d, max(ss_quantity) as m, sum(ss_wholesale_cost) as s,
                 ROW_NUMBER() OVER (PARTITION BY floor_day(`__time`) ORDER BY sum(ss_wholesale_cost) DESC ) AS rownum
          FROM store_sales_sold_time_subset
          GROUP BY i_brand_id, floor_day(`__time`)
      ) rs
      WHERE rownum <= 10;
      

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            julianhyde Julian Hyde
            jcamacho Jesús Camacho Rodríguez
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment