Details
-
New Feature
-
Status: Closed
-
Major
-
Resolution: Fixed
-
v1.1
-
None
Description
For now, kylin only support non-precise count distinct by hyperloglog.
In our production scenario, there're strongly requirements for precise count distinct, mainly for the column of type int or bigint, such as user-id, product-id, etc.
Implementing of precise count distinct for all types is difficult and not efficiency. However, only supporting int or bigint make this much easier. The values can be projected into a bitmap, which is easy to be compressed and stored, and easy to count.
I've created a POC based on RoaringBitmap, proving that worked. There's some more work to be done:
- RoaringBitmap only support int, there need a solution to support bigint;
- Add a new measure and codec, like HyperLogLogPlusCounter, make it easy to use;
- Add new measure on web ui, and check that whether the column type is int or bigint;
Attachments
Issue Links
- depends upon
-
KYLIN-976 Support Custom Aggregation Types
- Closed
- is related to
-
KYLIN-1379 More stable and functional precise count distinct implements after KYLIN-1186
- Closed