[PHOENIX-4164] APPROX_COUNT_DISTINCT becomes imprecise at 20m unique values. - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Invalid
Affects Version/s: None
Fix Version/s: None
Component/s: None
Labels:
None

Description

0: jdbc:phoenix:localhost> select count(*) from test;
+-----------+
| COUNT(1)  |
+-----------+
| 26931816  |
+-----------+
1 row selected (14.604 seconds)
0: jdbc:phoenix:localhost> select approx_count_distinct(v1) from test;
+----------------------------+
| APPROX_COUNT_DISTINCT(V1)  |
+----------------------------+
| 17221394                   |
+----------------------------+
1 row selected (21.619 seconds)

The table is generated from random numbers, and the cardinality of v1 is close to the number of rows.
(I cannot run a COUNT(DISTINCT(v1)), as it uses up all memory on my machine and eventually kills the regionserver - that's another story and another jira)

aertoria

Attachments

Issue Links

relates to

PHOENIX-4160 research for a proper hash size set for APPROX_COUNT_DISTINCT

Open

Activity

People

Assignee:: Unassigned

Reporter:: Lars Hofhansl

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 06/Sep/17 05:06

Updated:: 09/Sep/17 05:51

Resolved:: 09/Sep/17 05:29