Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-4492

[Python] Failure reading Parquet column as pandas Categorical in 0.12

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 0.12.0
    • 0.12.1
    • Python

    Description

      On pyarrow 0.12.0 some (but not all) columns cannot be read as category dtype. Attached is an extracted failing sample.

       

      import dask.dataframe as dd
      df = dd.read_parquet('slug.pq', categories=['slug'], engine='pyarrow').compute()
      print(len(df['slug'].dtype.categories))
       

      This works on pyarrow 0.11.1 (and fastparquet 0.2.1).

      Attachments

        1. slug.pq
          6.88 MB
          George Sakkis

        Issue Links

          Activity

            People

              Unassigned Unassigned
              gsakkis George Sakkis
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: