Log in
Skip to main content
Skip to sidebar
Linked Applications
Loading…
Dashboards
Projects
Issues
Help
Jira Core help
Keyboard Shortcuts
About Jira
Jira Credits
Log In
Spark
SPARK-27589
Spark file source V2
Log In
Export
XML
Word
Printable
JSON
Details
Type:
Umbrella
Status:
Open
Priority:
Major
Resolution:
Unresolved
Affects Version/s:
3.0.0
Fix Version/s:
None
Component/s:
SQL
Labels:
None
Description
Re-implement file sources with data source V2 API
Attachments
Sub-Tasks
Options
Show All
Show Open
Bulk operation
Open issue navigator
1.
Implement `doCanonicalize` in BatchScanExec for comparing query plan results
Resolved
Gengliang Wang
2.
File source V2: return correct result for Dataset.inputFiles()
Resolved
Gengliang Wang
3.
File source V2: support refreshing metadata cache
Resolved
Gengliang Wang
4.
Revise the exception message of schema inference failure in file source V2
Resolved
Gengliang Wang
5.
File source V2 table provider should be compatible with V1 provider
Resolved
Gengliang Wang
6.
Support UDF input_file_name in file source V2
Resolved
Gengliang Wang
7.
Support schema pruning in Orc V2
Resolved
Gengliang Wang
8.
Migrate Parquet to File Data Source V2
Resolved
Gengliang Wang
9.
File source V2: Invalidate cache data on overwrite/append
Resolved
Gengliang Wang
10.
File source V2: Prune unnecessary partition columns
Resolved
Gengliang Wang
11.
File source V2: return actual schema in method `FileScan.readSchema`
Resolved
Gengliang Wang
12.
Fall back all v2 file sources in `InsertIntoTable` to V1 FileFormat
Resolved
Gengliang Wang
13.
File source V2: Ignore empty files in load
Resolved
Gengliang Wang
14.
Handles exceptions on proceeding to next record in FilePartitionReader
Resolved
Gengliang Wang
15.
Migrate Text to File Data Source V2
Resolved
Unassigned
16.
File source v2 should validate data schema only
Resolved
Gengliang Wang
17.
Improve file source V2 framework
Resolved
Gengliang Wang
18.
Migrate JSON to File Data Source V2
Resolved
Gengliang Wang
19.
Migrate CSV to File Data Source V2
Resolved
Gengliang Wang
20.
Remove data source option check_files_exist
Resolved
Gengliang Wang
21.
Support handling partition values in the abstraction of file source V2
Resolved
Gengliang Wang
22.
File Source V2: avoid creating unnecessary FileIndex in the write path
Resolved
Gengliang Wang
23.
Support schema validation in File Source V2
Resolved
Gengliang Wang
24.
File source V2 write: create framework and migrate ORC to it
Resolved
Gengliang Wang
25.
Allow OrcColumnarBatchReader to return less partition columns
Resolved
Gengliang Wang
26.
Create file source V2 framework and migrate ORC read path
Resolved
Gengliang Wang
27.
File source V2: support reporting statistics
Resolved
Gengliang Wang
28.
Redact treeString of FileTable and DataSourceV2ScanExecBase
Resolved
Gengliang Wang
29.
Allow altering table add columns with CSVFileFormat/JsonFileFormat provider
Resolved
Gengliang Wang
30.
File source v2: support reading output of file streaming Sink
Resolved
Gengliang Wang
31.
useV1SourceList configuration should be for all data sources
Resolved
Gengliang Wang
32.
Migrate Avro to File source V2
Resolved
Gengliang Wang
33.
Add PathCatalog for data source V2
Resolved
Unassigned
34.
File source V2: support partition pruning
Resolved
Gengliang Wang
35.
Disable all the V2 file sources in Spark 3.0 by default
Resolved
Gengliang Wang
36.
File source V2: Support partition pruning with subqueries
Open
Unassigned
37.
Add V1/V2 tests for TextSuite and WholeTextFileSuite
Resolved
Gengliang Wang
38.
File source V2: support bucketing
Open
Unassigned
Activity
People
Assignee:
Unassigned
Reporter:
Gengliang Wang
Votes:
1
Vote for this issue
Watchers:
9
Start watching this issue
Dates
Created:
29/Apr/19 05:21
Updated:
27/Jan/21 21:47