Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-4492

[Python] Failure reading Parquet column as pandas Categorical in 0.12

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.12.0
    • Fix Version/s: 0.12.1
    • Component/s: Python
    • Labels:

      Description

      On pyarrow 0.12.0 some (but not all) columns cannot be read as category dtype. Attached is an extracted failing sample.

       

      import dask.dataframe as dd
      df = dd.read_parquet('slug.pq', categories=['slug'], engine='pyarrow').compute()
      print(len(df['slug'].dtype.categories))
       

      This works on pyarrow 0.11.1 (and fastparquet 0.2.1).

        Attachments

        1. slug.pq
          6.88 MB
          George Sakkis

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                gsakkis George Sakkis
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: