Details
-
Sub-task
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
1.0.1
Description
When writing Arrow data to Parquet, we serialise the schema's IPC representation. This schema is then read back by the Parquet reader, and used to preserve the array type information from the original Arrow data.
We however do not rely on the above mechanism when reading projected columns from a Parquet file; i.e. if we have a file with 3 columns, but we only read 2 columns, we do not yet rely on the serialised arrow schema; and can thus lose type information.
This behaviour was deliberately left out, as the function
parquet_to_arrow_schema_by_columns does not check for the existence of arrow schema in the metadata.
Attachments
Issue Links
- links to