Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
0.8.0, 0.9.0, 0.10.0, 0.11.0, 0.11.1
Description
Pyarrow 0.8 and 0.9 raises an AssertionError for one of the datasets I have (created using an older version of pyarrow). Repro steps:
In [1]: from pyarrow.parquet import ParquetDataset
In [2]: d = ParquetDataset(['bug.parq'])
In [3]: t = d.read()
In [4]: t.to_pandas()
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
<ipython-input-4-d17c9e2818f1> in <module>()
----> 1 t.to_pandas()
table.pxi in pyarrow.lib.Table.to_pandas()
~/envs/cli3/lib/python3.6/site-packages/pyarrow/pandas_compat.py in table_to_blockmanager(options, table, memory_pool, nthreads, categories)
529 # There must be the same number of field names and physical names
530 # (fields in the arrow Table)
--> 531 assert len(logical_index_names) == len(index_columns_set)
{{ 532 }}
533 # It can never be the case in a released version of pyarrow that
{{AssertionError: }}
Here's the file: https://www.dropbox.com/s/oja3khjsc5tycfh/bug.parq
(I was not able to attach it here due to a "missing token", whatever that means.)
Attachments
Issue Links
- links to