Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
Description
Creating an empty table:
In [1]: table = pa.table({'a': pa.array([], type=pa.timestamp('us'))}) In [2]: table['a'] Out[2]: <pyarrow.lib.ChunkedArray object at 0x7fbb783e8098> [ [] ] In [3]: table.to_pandas() Out[3]: Empty DataFrame Columns: [a] Index: []
the above works. But the ChunkedArray still has 1 empty chunk. When filtering data, you can actually get no chunks, and this fails:
In [4]: table2 = table.slice(0, 0)
In [5]: table2['a']
Out[5]:
<pyarrow.lib.ChunkedArray object at 0x7fbb783aa4a8>
[
]
In [6]: table2.to_pandas()
../src/arrow/table.cc:48: Check failed: (chunks.size()) > (0) cannot construct ChunkedArray from empty vector and omitted type
...
Aborted (core dumped)
and this seems to happen specifically for timestamp type, and specifically with non-ns unit (eg with us as above, which is the default in arrow).
I noticed this when reading a parquet file of the taxi dataset, where the filter I used resulted in an empty batch.
Attachments
Issue Links
- is related to
-
ARROW-8142 [C++] Casting a chunked array with 0 chunks critical failure
- Resolved
- links to