Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
This question came up in the GitHub issue: https://github.com/apache/arrow/issues/14025 .
Description:
If a user wants to change a type of one single column when using to_parquet in pandas (or dask) they currently need to specify the schema with all columns included. If a column is not specified in the schema, it will not be included in the parquet file.
The type inference happens when converting a python object (eg pandas dataframe, or a dict, ..) to an Arrow Table, and once you have such table with a fixed schema, writing to Parquet doesn't do type inference anymore (since arrow types map to parquet types).
Proposal
There should be a possibility to provide a way to specify the type of a subset of columns for from_pandas.