Details
-
New Feature
-
Status: Open
-
Major
-
Resolution: Unresolved
-
Impala 4.0.0
-
None
-
ghx-label-8
Description
There were some new functions added recently to add support for Apache DataSketches KLL calculations. These functions purpose is to give an approximate boundaries for
a given dataset. It is an implementation of a very compact quantiles sketch with lazy compaction scheme and nearly optimal accuracy per bit.
The newly introduced functions are:
ds_kll_sketch()
ds_kll_quantile()
ds_kll_quantiles_as_string()
ds_kll_n()
ds_kll_union()
ds_kll_rank()
ds_kll_pmf_as_string()
ds_kll_cdf_as_string()
ds_kll_stringify()
Related Jiras:
https://issues.apache.org/jira/browse/IMPALA-9959
https://issues.apache.org/jira/browse/IMPALA-9962
https://issues.apache.org/jira/browse/IMPALA-9963
https://issues.apache.org/jira/browse/IMPALA-10017
https://issues.apache.org/jira/browse/IMPALA-10018
https://issues.apache.org/jira/browse/IMPALA-10019
https://issues.apache.org/jira/browse/IMPALA-10020
https://issues.apache.org/jira/browse/IMPALA-10108
We should document these and mark them as experimental features so that users can try out and hopefully give feedback.