Uploaded image for project: 'Calcite'
  1. Calcite
  2. CALCITE-1587

Druid adapter: topN returns approximate results

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.11.0
    • Fix Version/s: 1.12.0
    • Component/s: druid
    • Labels:
      None

      Description

      Currently, we convert to topN queries. However, metrics returned by Druid will be approximate values. Thus, probably we should not convert to Druid topN queries and rather always use Druid groupBy.

        Issue Links

          Activity

          Hide
          gian Gian Merlino added a comment -

          In Druid's built-in SQL we make this an option, druid.sql.planner.useApproximateTopN. fwiw we also have a similar option for whether COUNT(DISTINCT col)) should be approximate or not.

          Also, topNs are exact if you are sorting on the dimension, and will be faster than groupBy in that case since groupBy doesn't yet push down limits all the way to the data nodes (although we are working on this). So it's still useful, and exact, to use them for queries like "SELECT DISTINCT foo FROM bar ORDER BY foo LIMIT 50". In Druid we do this even if druid.sql.planner.useApproximateTopN is false.

          The topN approximation is described in detail at http://druid.io/docs/latest/querying/topnquery.html#aliasing

          Show
          gian Gian Merlino added a comment - In Druid's built-in SQL we make this an option, druid.sql.planner.useApproximateTopN. fwiw we also have a similar option for whether COUNT(DISTINCT col)) should be approximate or not. Also, topNs are exact if you are sorting on the dimension, and will be faster than groupBy in that case since groupBy doesn't yet push down limits all the way to the data nodes (although we are working on this). So it's still useful, and exact, to use them for queries like "SELECT DISTINCT foo FROM bar ORDER BY foo LIMIT 50". In Druid we do this even if druid.sql.planner.useApproximateTopN is false. The topN approximation is described in detail at http://druid.io/docs/latest/querying/topnquery.html#aliasing
          Hide
          julianhyde Julian Hyde added a comment -

          Agree with Gian Merlino, we should enable approximation only if the user asks for it. In the Calcite framework I will add a property whether to allow approximate topN, and another whether to allow approximate distinct-count. I do not think it wise to expose it via a system or connection property, until we also extend our SQL to allow users to ask for approximations in the SQL: see CALCITE-1588.

          To fix this issue, add the property, default false, and the only way to set it at present will be via code (e.g. from the test suite).

          Show
          julianhyde Julian Hyde added a comment - Agree with Gian Merlino , we should enable approximation only if the user asks for it. In the Calcite framework I will add a property whether to allow approximate topN, and another whether to allow approximate distinct-count. I do not think it wise to expose it via a system or connection property, until we also extend our SQL to allow users to ask for approximations in the SQL: see CALCITE-1588 . To fix this issue, add the property, default false, and the only way to set it at present will be via code (e.g. from the test suite).
          Hide
          julianhyde Julian Hyde added a comment - - edited

          Fixed in http://git-wip-us.apache.org/repos/asf/calcite/commit/517bf62e.

          The new properties are called approximateDistinctCount and approximateTopN; see https://calcite.apache.org/docs/adapter.html#jdbc-connect-string-parameters

          Show
          julianhyde Julian Hyde added a comment - - edited Fixed in http://git-wip-us.apache.org/repos/asf/calcite/commit/517bf62e . The new properties are called approximateDistinctCount and approximateTopN ; see https://calcite.apache.org/docs/adapter.html#jdbc-connect-string-parameters
          Hide
          julianhyde Julian Hyde added a comment -

          Resolved in release 1.12.0 (2017-03-24).

          Show
          julianhyde Julian Hyde added a comment - Resolved in release 1.12.0 (2017-03-24).

            People

            • Assignee:
              julianhyde Julian Hyde
              Reporter:
              jcamachorodriguez Jesus Camacho Rodriguez
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development