[CALCITE-1787] thetaSketch Support for Druid Adapter - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 1.12.0
Fix Version/s: 1.14.0
Component/s: druid-adapter
Labels:
None

Description

Currently, the Druid adapter does not support the thetaSketch aggregate type, which is used to measure the cardinality of a column quickly. Many Druid instances support theta sketches, so I think it would be a nice feature to have.

I've been looking at the Druid adapter, and propose we add a new DruidType called thetaSketch and then add logic in the getJsonAggregation method in class DruidQuery to generate the thetaSketch aggregate. This will require accessing information about the columns (what data type they are) so that the thetaSketch aggregate is only produced if the column's type is thetaSketch.

Also, I've noticed that a hyperUnique DruidType is currently defined, but a hyperUnique aggregate is never produced. Since both are approximate aggregators, I could also couple in the logic for hyperUnique.

I'd love to hear your thoughts on my approach, and any suggestions you have for this feature.

Attachments

Issue Links

is related to

CALCITE-1206 Push HAVING, ORDER BY and LIMIT into Druid

Open

CALCITE-1670 Count distinct on druid is translated to Cardinality aggregator which is approximate

Open

CALCITE-1587 Druid adapter: topN returns approximate results

Closed

CALCITE-1803 Push Project that follows Aggregate down to Druid

Closed

relates to

CALCITE-1853 Push Count distinct into Druid when approximate results are acceptable

Closed

CALCITE-4457 Remove dependency on CalciteConnectionConfig from SqlValidatorCatalogReader and Table core API

Open

(1 relates to)

Activity

People

Assignee:: Zain Humayun

Reporter:: Zain Humayun

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 11/May/17 17:05

Updated:: 27/Feb/24 22:24

Resolved:: 16/Aug/17 21:38