Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
Description
related to https://issues.apache.org/jira/browse/ARROW-439
pandas Categorical types are not NotImplemented. minimal example.
pandas 0.20.3 & pyarrow 0.5.0
In [1]: df = pd.DataFrame({'a': pd.Categorical(list('abc'))}) In [2]: df.dtypes Out[2]: a category dtype: object In [4]: import pyarrow In [5]: import pyarrow.parquet In [6]: table = pyarrow.Table.from_pandas(df, timestamps_to_ms=True) ...: pyarrow.parquet.write_table( ...: table, 'foo.pq') ...: ...: --------------------------------------------------------------------------- ArrowNotImplementedError Traceback (most recent call last) <ipython-input-6-4512e9a2e15e> in <module>() 1 table = pyarrow.Table.from_pandas(df, timestamps_to_ms=True) 2 pyarrow.parquet.write_table( ----> 3 table, 'foo.pq') 4 /Users/jreback/miniconda3/envs/pandas/lib/python3.6/site-packages/pyarrow/parquet.py in write_table(table, where, row_group_size, version, use_dictionary, compression, use_deprecated_int96_timestamps, **kwargs) 770 version=version, 771 use_deprecated_int96_timestamps=use_deprecated_int96_timestamps) --> 772 writer = ParquetWriter(where, table.schema, **options) 773 writer.write_table(table, row_group_size=row_group_size) 774 writer.close() _parquet.pyx in pyarrow._parquet.ParquetWriter.__cinit__() error.pxi in pyarrow.lib.check_status() ArrowNotImplementedError: NotImplemented: unhandled type
Attachments
Issue Links
- blocks
-
PARQUET-1015 Object categoricals are not serialized when only None is present
- Resolved