[SPARK-24073] DataSourceV2: Rename DataReaderFactory to InputPartition. - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 2.3.0
Fix Version/s: 2.4.0
Component/s: SQL
Labels:
None

Description

Just before 2.3.0, ~~SPARK-23219~~ renamed ReadTask to DataReaderFactory. The intent was to make the read and write API match (write side uses DataWriterFactory), but the underlying problem is that the two classes are not equivalent.

ReadTask/DataReader function as Iterable/Iterator. ReadTask is a specific to a read task, in contrast to DataWriterFactory where the same factory instance is used in all write tasks. ReadTask's purpose is to manage the lifecycle of DataReader with an explicit create operation to mirror the close operation. This is no longer clear from the API, where DataReaderFactory appears to be more generic than it is and it isn't clear why a set of them is produced for a read.

We should rename DataReaderFactory back to ReadTask, which correctly conveys the purpose and use of the class.

Attachments

Issue Links

links to

[Github] Pull Request #21145 (rdblue)

Activity

People

Assignee:: Ryan Blue

Reporter:: Ryan Blue

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 24/Apr/18 19:46

Updated:: 29/May/18 14:04

Resolved:: 10/May/18 04:49