[IMPALA-10445] The ability to adjust NDV's precision with query option - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: Impala 4.0.0
Fix Version/s: Impala 4.0.0
Component/s: Frontend
Labels:
None

Target Version:

Impala 4.0.0
Epic Color:
ghx-label-11

Description

Since ~~IMPALA-2658~~, we can trade memory for more accurate NDV estimation. It is fascinating because tests showing error rate within 0.1% while no tremendous resource usage rise is found( #registers is 2 << 18). Users may have less complaint on computation precision in the future.

However, the road to apply high precision NDV to production environment is uneven.

1) We have to re-write sqls for a large number of historical workloads. Which is time costing and is prone to error.

2) Cluster users, aka sql writers, are reluctant to lower their expectations. It would be more convenient to have a way for cluster admins to adjust precision for each Admission Control queue according to cluster's resource usage(rough world).

Propose:

Add a new query option DEFAULT_NDV_SCALE to change the default precision setting for NDV()

Implementation:

Add a query option in FE
If the option is set, use the matching NDV(<expr>, <scale>) function instead of NDV().

Attachments

Issue Links

relates to

IMPALA-1187 Optionally auto transform multiple count distincts as NDV

Resolved

IMPALA-110 Add support for multiple distinct operators in the same query block

Resolved

Activity

People

Assignee:: Fifteen

Reporter:: Fifteen

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 18/Jan/21 05:31

Updated:: 27/Apr/21 13:55

Resolved:: 22/Apr/21 09:17