Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-18001

[Python] Provide a way to specify the type of a subset of columns for from_pandas

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • Python
    • None

    Description

      This question came up in the GitHub issue: https://github.com/apache/arrow/issues/14025 .

      Description:

      If a user wants to change a type of one single column when using to_parquet in pandas (or dask) they currently need to specify the schema with all columns included. If a column is not specified in the schema, it will not be included in the parquet file.

      The type inference happens when converting a python object (eg pandas dataframe, or a dict, ..) to an Arrow Table, and once you have such table with a fixed schema, writing to Parquet doesn't do type inference anymore (since arrow types map to parquet types).

      Proposal

      There should be a possibility to provide a way to specify the type of a subset of columns for from_pandas.

      Attachments

        Activity

          People

            Unassigned Unassigned
            alenka Alenka Frim
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: