Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-10439

Implement count(distinct) function (DataSketches/Theta)

    XMLWordPrintableJSON

Details

    • Epic
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • Backend
    • None
    • datasketches-theta
    • ghx-label-13

    Description

      Implement the count(distinct) function from the DataSketches library for Theta in C++.

      Theta sketch provides approximate distinct counting with set operations (union, intersection and set difference).
      This can be used for retention analysis, eg: "How many unique users signed up in week 1, and purchased something in week 2?"

      General info about the sketch:
      https://datasketches.apache.org/docs/Theta/ThetaSketchFramework.html

      C++ implementation to wrap:
      https://github.com/apache/datasketches-cpp/tree/master/theta

      Using thetaSketch in Druid:
      https://druid.apache.org/docs/latest/development/extensions-core/datasketches-theta.html

      Attachments

        Activity

          People

            Unassigned Unassigned
            chufucun Fucun Chu
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: