[KUDU-2490] implement Kudu DataSourceV2 and related classes - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: spark
Labels:
- roadmap-candidate

Description

The current Kudu-Spark bindings implement a DefaultSource that extends a RelationProvider, which provides BaseRelations to Spark, which, as I understand it, are physical units of query execution and represent sets of rows. The Kudu BaseRelation (the KuduRelation) implements a couple of traits to fit into Spark: PrunedFilteredScan, which allows predicates to be pushed into Kudu, and InsertableRelation, which allows writes to be pushed into Kudu. An issue with these bindings is that, while they provide interfaces to insert/get data, they do not provide interfaces to push details to Spark that might be useful to optimizing a Kudu query.

Among other things, this is inconvenient for all datasources that might want to take such optimizations into their own hands, and the Spark community appears to be making efforts in revamping their DataSource APIs in the form of DataSourceV2, and as it pertains to read support, the v2 DataSourceReader. This new world order provides a clear path towards implementing various optimizations that are currently unavailable with the current Spark bindings, without pushing changes to Spark itself.

Of note, the v2 DataSourceReader can be extended with SupportsReportStatistics, which could allow Kudu to expose statistics to Kudu without having to rely on HMS (although pushing stats to HMS isn't an unreasonable approach either). More traits and details about the API can be found here.

Attachments

Issue Links

is related to

KUDU-2518 SparkSQL queries without temporary tables

Open

relates to

KUDU-2019 Expose table/column statistics

Open

KUDU-2515 Implement Spark join optimization support

Open

Activity

People

Assignee:: Mayank Asthana

Reporter:: Andrew Wong

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 02/Jul/18 17:58

Updated:: 08/Feb/21 15:23