[ARROW-18001] [Python] Provide a way to specify the type of a subset of columns for from_pandas - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: Python
Labels:
None

External issue URL:
https://github.com/apache/arrow/issues/33205

Description

This question came up in the GitHub issue: https://github.com/apache/arrow/issues/14025 .

Description:

If a user wants to change a type of one single column when using to_parquet in pandas (or dask) they currently need to specify the schema with all columns included. If a column is not specified in the schema, it will not be included in the parquet file.

The type inference happens when converting a python object (eg pandas dataframe, or a dict, ..) to an Arrow Table, and once you have such table with a fixed schema, writing to Parquet doesn't do type inference anymore (since arrow types map to parquet types).

Proposal

There should be a possibility to provide a way to specify the type of a subset of columns for from_pandas.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Alenka Frim

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 12/Oct/22 10:46

Updated:: 11/Jan/23 11:57