[SPARK-26744] Support schema validation in File Source V2 - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 3.0.0
Fix Version/s: 3.0.0
Component/s: SQL
Labels:
None

Description

The internal API supportDataType in FileFormat validates the output/input schema before task execution starts. So that we can avoid launching read/write tasks which would fail. Also, users can see clean error messages.

This PR is to implement the same internal API in the FileDataSourceV2 framework. Comparing to FileFormat, FileDataSourceV2 has multiple layers. The API is added in two places:

1. Read path: the table schema is determined in TableProvider.getTable. The actual read schema can be a subset of the table schema. This PR proposes to validate the actual read schema in FileScan.
2. Write path: validate the actual output schema in FileWriteBuilder.

Attachments

Issue Links

links to

GitHub Pull Request #23714

GitHub Pull Request #23828

Activity

People

Assignee:: Gengliang Wang

Reporter:: Gengliang Wang

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 27/Jan/19 05:53

Updated:: 29/Apr/19 05:46

Resolved:: 16/Feb/19 09:21