Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
Description
A reproducer in Python:
import pyarrow as pa import pyarrow.parquet as pq class MyStructType(pa.PyExtensionType): def __init__(self): pa.PyExtensionType.__init__(self, pa.struct([('left', pa.int64()), ('right', pa.int64())])) def __reduce__(self): return MyStructType, () struct_array = pa.StructArray.from_arrays( [ pa.array([0, 1], type="int64", from_pandas=True), pa.array([1, 2], type="int64", from_pandas=True), ], names=["left", "right"], ) # works table = pa.table({'a': struct_array}) pq.write_table(table, "test_struct.parquet") # doesn't work mystruct_array = pa.ExtensionArray.from_storage(MyStructType(), struct_array) table = pa.table({'a': mystruct_array}) pq.write_table(table, "test_struct.parquet")
Writing the simple StructArray nowadays works (and reading it back in as well).
But when the struct array is the storage array of an ExtensionType, it fails with the following error:
ArrowException: Unknown error: data type leaf_count != builder_leaf_count1 2
Attachments
Issue Links
- links to