Details
-
New Feature
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
3.3.0
-
None
Description
Field Id is a native field in the Parquet schema (https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L398)
After this PR, when the requested schema has field IDs, Parquet readers will first use the field ID to determine which Parquet columns to read, before falling back to using column names as before. It enables matching columns by field id for supported DWs like iceberg and Delta.
This PR supports:
- vectorized reader
- Parquet-mr reader
Attachments
Issue Links
- Blocked
-
SPARK-39997 ParquetSchemaConverter fails match schema by id
- In Progress
- links to