[SPARK-22386] Data Source V2 improvements - ASF JIRA

Attach files

Attach Screenshot

Add vote

Voters

Watch issue

Watchers

Create sub-task

Link

Clone

Update Comment Author

Replace String in Comment

Update Comment Visibility

Delete Comments

XML

Word

Printable

JSON

Details

Type: Improvement
Status: In Progress
Priority: Major
Resolution: Unresolved
Affects Version/s: 2.3.0
Fix Version/s: None
Component/s: SQL
Labels:
- releasenotes

Target Version/s:

3.0.0

Attachments

Issue Links

Add Link

depends upon

SPARK-25531 new write APIs for data source v2

Resolved

Delete this link

is duplicated by

SPARK-9182 filter and groupBy on DataFrames are not passed through to jdbc source

Resolved

Delete this link

is related to

SPARK-23521 SPIP: Standardize SQL logical plans with DataSourceV2

Resolved

Delete this link

relates to

SPARK-26088 DataSourceV2 should expose row count and attribute statistics

Resolved

Delete this link

Sub-Tasks

Create Sub-Task

1.	Limit push down	Resolved	Unassigned	Actions
2.	Aggregate push down	Resolved	Unassigned	Actions
3.	add `MetadataCreationSupport` trait to separate data and metadata handling at write path	Resolved	Unassigned	Actions
4.	DataSourceV2 should use immutable trees.	Resolved	Ryan Blue	Actions
5.	DataSourceV2 should support named tables in DataFrameReader, DataFrameWriter	Resolved	Unassigned	Actions
6.	Reorganize packages in data source V2	Resolved	Gengliang Wang	Actions
7.	DataSourceV2 should apply some validation when writing.	Resolved	Unassigned	Actions
8.	DataSourceV2 should use the output commit coordinator.	Resolved	Ryan Blue	Actions
9.	DataSourceV2 readers should always produce InternalRow.	Resolved	Ryan Blue	Actions
10.	DataSourceOptions should handle path and table names to avoid confusion.	Resolved	Wenchen Fan	Actions
11.	use InternalRow in DataSourceWriter	Resolved	Wenchen Fan	Actions
12.	DataSourceV2 should provide a way to get a source's schema.	Resolved	Unassigned	Actions
13.	DataSourceV2 should not allow userSpecifiedSchema without ReadSupportWithSchema	Resolved	Ryan Blue	Actions
14.	DataSourceV2: Rename DataReaderFactory to InputPartition.	Resolved	Ryan Blue	Actions
15.	Data Source V2: Join Push Down	Resolved	Unassigned	Actions
16.	DataSourceV2 should push filters and projection at physical plan conversion	Resolved	Ryan Blue	Actions
17.	remove SupportsDeprecatedScanRow	Resolved	Wenchen Fan	Actions
18.	Add support for USING syntax for DataSourceV2	Resolved	Unassigned	Actions
19.	merge ReadSupport and ReadSupportWithSchema	Resolved	Wenchen Fan	Actions
20.	DataSourceV2: Remove SupportsPushDownCatalystFilters	Resolved	Reynold Xin	Actions
21.	DataSourceV2: Add interfaces to pass required sorting and clustering for writes	Resolved	Unassigned	Actions
22.	DataSourceV2: Structured Streaming does not respect SessionConfigSupport	Resolved	Hyukjin Kwon	Actions
23.	Avoid to create a readsupport at write path in Data Source V2	Resolved	Hyukjin Kwon	Actions
24.	Recover options and properties and pass them back into the v1 API	Open	Unassigned	Actions
25.	DataSourceV2: Add new DataFrameWriter API for v2	Resolved	Ryan Blue	Actions
26.	Pass in number of partitions to BuildWriter	Resolved	Ximo Guanter	Actions
27.	DataSource V2: API to request distribution and ordering on write	Resolved	Anton Okolnychyi	Actions
28.	Data Source V2: Remove read specific distributions	Open	Unassigned	Actions
29.	DataSource V2: Build logical writes in the optimizer	Resolved	Anton Okolnychyi	Actions
30.	DataSource V2: Inject repartition and sort nodes to satisfy required distribution and ordering	Resolved	Anton Okolnychyi	Actions
31.	DataSource V2: Use Write abstraction in StreamExecution	Resolved	Anton Okolnychyi	Actions
32.	DataSource V2: Support required distribution and ordering in SS	Resolved	Anton Okolnychyi	Actions
33.	Let AQE determine the right parallelism in DistributionAndOrderingUtils	Open	Unassigned	Actions
34.	DS V2 Aggregate push down	Resolved	Huaxin Gao	Actions
35.	Aggregate (Min/Max/Count) push down for ORC	Resolved	Cheng Su	Actions
36.	Aggregate (Min/Max/Count) push down for Parquet	Resolved	Huaxin Gao	Actions
37.	Push down group by partition column for Aggregate (Min/Max/Count) for Parquet	Resolved	Huaxin Gao	Actions
38.	Push down filter by partition column for Aggregate (Min/Max/Count) for Parquet	Resolved	Huaxin Gao	Actions
39.	Add benchmark for aggregate push down	Open	Unassigned	Actions
40.	Do not split input file for Parquet reader with aggregate push down	Resolved	Cheng Su	Actions
41.	Not log empty aggregate and group by in JDBCScan	Resolved	Huaxin Gao	Actions
42.	DataSourceV2: Distribution and ordering support V2 function in writing	Resolved	Cheng Pan	Actions