Details
-
Improvement
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
Impala 2.2.4
Description
Hyperloglog algorithm used by NDV defaults to a precision of 10. Being able to set this precision would have two benefits:
- Lower precision sizes can speed up the performance, as a precision of 9 has 1/2 the number of registers as 10 (exponential) and may be just as accurate depending on expected cardinality.
- Higher precision can help with very large cardinalities (100 million to billion range) and will typically provide more accurate data. Those who are presenting estimates to end users will likely be willing to trade some performance cost for more accuracy, while still out performing the naive approach by a large margin.
Propose adding the overloaded function NDV(expression, int precision)
with accepted range between 18 and 4 inclusive.
Attachments
Attachments
Issue Links
- is depended upon by
-
IMPALA-5449 Implement LinearCounting Functionality of HyperLogLog++ for small cardinalities
- Reopened
- is related to
-
IMPALA-10538 Document the newly added scale argument of ndv function
- Resolved