Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-3654

[Python] Column with CategoricalIndex fails to be read back

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Cannot Reproduce
    • 0.11.1
    • None
    • Python

    Description

      When a column with a {Categoricalndex} is written the data can never be read back.

       

      df = pd.DataFrame([['a', 'b'], ['c', 'd']], columns=['c1', 'c2'])
      df['c1'] = df['c1'].astype('category')
      df = df.set_index(['c1'])
      
      table = pa.Table.from_pandas(df)
      pq.write_table(table, 'test.parquet')
      
      pq.read_pandas('test.parquet').to_pandas()
      

      Results in

      KeyError                                  Traceback (most recent call last)
      ~/venv/mpptool/lib/python3.7/site-packages/pyarrow/pandas_compat.py in _pandas_type_to_numpy_type(pandas_type)
          676     try:
      --> 677         return _pandas_logical_type_map[pandas_type]
          678     except KeyError:
      
      KeyError: 'categorical'
      

      The schema looks good:

      column_indexes": [{"name": "c1", "field_name": "c1", "pandas_type": "categorical", "numpy_type": "int8", "metadata": {"num_categories": 2, "ordered": false}}]
      

      Attachments

        Activity

          People

            Unassigned Unassigned
            aberres Armin Berres
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: