Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
5.1
-
None
-
None
Description
The main problem is mincount... in a non-distrib query, numBuckets reflects the number of buckets that are screened out after mincount is applied. In distributed mode, we can't do this (or rather, the only way to do it would be to tramsmit all bucket counts to an aggregator node).
We should perhaps just make numBuckets always pre-mincount to be consistent, and use hyper-log-log by default?