Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
Description
When casting a schema of an empty table from dict encoded to non-dict encoded type a critical error is raised and not handled causing the interpreter to shut down.
This only happens after a parquet roundtrip
import pyarrow as pa import pandas as pd import pyarrow.parquet as pq df = pd.DataFrame({"col": ["a"]}).astype({"col": "category"}).iloc[:0] table = pa.Table.from_pandas(df) field = table.schema[0] new_field = pa.field(field.name, field.type.value_type, field.nullable, field.metadata) buf = pa.BufferOutputStream() pq.write_table(table, buf) reader = pa.BufferReader(buf.getvalue().to_pybytes()) table = pq.read_table(reader) schema = table.schema.remove(0).insert(0, new_field) new_table = table.cast(schema) assert new_table.schema == schema
Output
WARNING: Logging before InitGoogleLogging() is written to STDERR F0318 09:55:14.266649 299722176 table.cc:47] Check failed: (chunks.size()) > (0) cannot construct ChunkedArray from empty vector and omitted type
Attachments
Issue Links
- relates to
-
ARROW-7907 [Python] Conversion to pandas of empty table with timestamp type aborts
- Resolved
- links to