[SPARK-26871] File Source V2: avoid creating unnecessary FileIndex in the write path - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 3.0.0
Fix Version/s: 3.0.0
Component/s: SQL
Labels:
None

Description

In https://github.com/apache/spark/pull/23383, the file source V2 framework is implemented. In the PR, FileIndex is created as a member of FileTable, so that we can implement partition pruning like https://github.com/apache/spark/commit/0f9fcabb4ac2e8afec14d010e86467372a85d334 in the future(As data source V2 catalog is under development, partition pruning is removed from the PR)

However, after write path of file source V2 is implemented, I find that a simple write will create FileIndex, which is required by FileTable. This is a sort of regression.
This PR is to make FileIndex as a lazy value in FileTable, so that we can avoid creating unnecessary FileIndex in the write path.

Attachments

Issue Links

links to

GitHub Pull Request #23774

Activity

People

Assignee:: Gengliang Wang

Reporter:: Gengliang Wang

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 13/Feb/19 15:50

Updated:: 29/Apr/19 05:46

Resolved:: 15/Feb/19 06:59