Description
A schema can evolve in several ways and the followings are already supported in file-based data sources.
1. Add a column
2. Remove a column
3. Change a column position
4. Change a column type
This issue aims to guarantee users a backward-compatible schema evolution coverage on file-based data sources and to prevent future regressions by adding schema evolution test suites explicitly.
Here, we consider safe evolution without data loss. For example, data type evolution should be from small types to larger types like `int`to`long`, not vice versa.
As of today, in the master branch, file-based data sources have schema evolution coverages like the followings.
File Format | Coverage | Note |
---|---|---|
TEXT | N/A | Schema consists of a single string column. |
CSV | 1, 2, 4 | |
JSON | 1, 2, 3, 4 | |
ORC | 1, 2, 3, 4 | Native vectorized ORC reader has the widest coverage. |
PARQUET | 1, 2, 3 |
Attachments
Issue Links
- blocks
-
SPARK-20901 Feature parity for ORC with Parquet
- Open
- is related to
-
SPARK-35461 Error when reading dictionary-encoded Parquet int column when read schema is bigint
- Open
- links to