Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-13342

[Python] Categorical boolean column saved as regular boolean in parquet

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 4.0.1
    • None
    • Parquet, Python
    • None

    Description

      When saving a pandas dataframe to parquet, if there is a categorical column where the categories are boolean, the column is saved as regular boolean.

      This causes an issue because, when reading back the parquet file, I expect the column to still be categorical.

       
      Reproducible example:

      import pandas as pd
      import pyarrow
      
      # Create dataframe with boolean column that is then converted to categorical
      df = pd.DataFrame({'a': [True, True, False, True, False]})
      df['a'] = df['a'].astype('category')
      
      # Convert to arrow Table and save to disk
      table = pyarrow.Table.from_pandas(df)
      pyarrow.parquet.write_table(table, 'test.parquet')
      
      # Reload data and convert back to pandas
      table_rel = pyarrow.parquet.read_table('test.parquet')
      df_rel = table_rel.to_pandas()
      

      The arrow table variable correctly converts the column to an arrow DICTIONARY type:

      >>> df['a']
      0     True
      1     True
      2    False
      3     True
      4    False
      Name: a, dtype: category
      Categories (2, object): [False, True]
      >>>
      >>> table
      pyarrow.Table
      a: dictionary<values=bool, indices=int8, ordered=0>
      

      However, the reloaded column is now a regular boolean:

      >>> table_rel
      pyarrow.Table
      a: bool
      >>>
      >>> df_rel['a']
      0     True
      1     True
      2    False
      3     True
      4    False
      Name: a, dtype: bool
      

      I would have expected the column to be read back as categorical.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              jmoreira Joao Moreira
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated: