Details
-
Epic
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
datasketches-theta
-
ghx-label-13
Description
Implement the count(distinct) function from the DataSketches library for Theta in C++.
Theta sketch provides approximate distinct counting with set operations (union, intersection and set difference).
This can be used for retention analysis, eg: "How many unique users signed up in week 1, and purchased something in week 2?"
General info about the sketch:
https://datasketches.apache.org/docs/Theta/ThetaSketchFramework.html
C++ implementation to wrap:
https://github.com/apache/datasketches-cpp/tree/master/theta
Using thetaSketch in Druid:
https://druid.apache.org/docs/latest/development/extensions-core/datasketches-theta.html