Uploaded image for project: 'Calcite'
  1. Calcite
  2. CALCITE-2035

Add APPROX_COUNT_DISTINCT aggregate function

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.15.0
    • Component/s: None
    • Labels:
      None

      Description

      Add APPROX_COUNT_DISTINCT aggregate function. The effect of APPROX_COUNT_DISTINCT(args) is the same as COUNT(DISTINCT args) but the planner may generate approximate results (e.g. by using HyperLogLog).

      Note "may" not "must", above: the planner may choose a plan that returns exact results.

      This is a step towards CALCITE-1588, which would allow an APPROXIMATE clause and specify in more detail the degree of approximation allowed.

        Issue Links

          Activity

          Hide
          julianhyde Julian Hyde added a comment -

          Resolved in release 1.15.0 (2017-12-11).

          Show
          julianhyde Julian Hyde added a comment - Resolved in release 1.15.0 (2017-12-11).
          Hide
          julianhyde Julian Hyde added a comment -

          Thanks for your review, Gian Merlino; I incorporated your suggestions.

          Show
          julianhyde Julian Hyde added a comment - Thanks for your review, Gian Merlino ; I incorporated your suggestions.
          Hide
          julianhyde Julian Hyde added a comment -

          Fixed in fe3529d9.

          Show
          julianhyde Julian Hyde added a comment - Fixed in fe3529d9 .
          Hide
          gian Gian Merlino added a comment -

          Julian Hyde, I just took a look and only had a couple of small comments. The syntax and behavior contract look good to me.

          Show
          gian Gian Merlino added a comment - Julian Hyde , I just took a look and only had a couple of small comments. The syntax and behavior contract look good to me.
          Hide
          julianhyde Julian Hyde added a comment -

          It is done the same way as before. If you do count-distinct in Drill and you don't allow approximate, your query won't plan.

          I wrote the specification to give us freedom in future. I didn't change the implementation.

          Show
          julianhyde Julian Hyde added a comment - It is done the same way as before. If you do count-distinct in Drill and you don't allow approximate, your query won't plan. I wrote the specification to give us freedom in future. I didn't change the implementation.
          Hide
          bslim slim bouguerra added a comment -

          Julian Hyde with respect to Note "may" not "must", above: the planner may choose a plan that returns exact results.
          is there a cost function that dictates which plan to chose? I couldn't see how that is done?
          Thanks.

          Show
          bslim slim bouguerra added a comment - Julian Hyde with respect to Note "may" not "must", above: the planner may choose a plan that returns exact results. is there a cost function that dictates which plan to chose? I couldn't see how that is done? Thanks.
          Show
          julianhyde Julian Hyde added a comment - Gian Merlino , slim bouguerra , Jesus Camacho Rodriguez , Can one of your please review my proposed fix in https://github.com/julianhyde/calcite/commits/2035-approx-count-distinct .

            People

            • Assignee:
              julianhyde Julian Hyde
              Reporter:
              julianhyde Julian Hyde
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development