Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-2658

Extend the NDV function to accept a precision

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • Impala 2.2.4
    • Impala 4.0.0
    • Backend

    Description

      Hyperloglog algorithm used by NDV defaults to a precision of 10. Being able to set this precision would have two benefits:

      1. Lower precision sizes can speed up the performance, as a precision of 9 has 1/2 the number of registers as 10 (exponential) and may be just as accurate depending on expected cardinality.
      2. Higher precision can help with very large cardinalities (100 million to billion range) and will typically provide more accurate data. Those who are presenting estimates to end users will likely be willing to trade some performance cost for more accuracy, while still out performing the naive approach by a large margin.

      Propose adding the overloaded function NDV(expression, int precision)
      with accepted range between 18 and 4 inclusive.

      Attachments

        Issue Links

          Activity

            People

              sql_forever Qifan Chen
              PeterEbert Peter Ebert
              Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: