Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
3.0.0, 4.0.0, 4.0.1
-
Python 3.7, Ubuntu 20.04
Description
PyArrow is crashing when applying `filter` or `take` on already empty extension arrays.
The bug can be reproduced with the documentation example:
import pyarrow as pa class Point3DArray(pa.ExtensionArray): def to_numpy_array(self): return self.storage.flatten().to_numpy().reshape((-1, 3)) class Point3DType(pa.PyExtensionType): def __init__(self): pa.PyExtensionType.__init__(self, pa.list_(pa.float32(), 3)) def __reduce__(self): return Point3DType, () def __arrow_ext_class__(self): return Point3DArray storage = pa.array([[1, 2, 3], [4, 5, 6]], pa.list_(pa.float32(), 3)) arr = pa.ExtensionArray.from_storage(Point3DType(), storage) arr = arr.filter(pa.array([False, False])) # Crashing here... arr.filter(pa.array([], pa.bool_())) # Crashing as well... arr.take(pa.array([], pa.int32()))
The underlying issue seems to be that the function `nulls` is not implemented for extension types in the C++ codebase: https://github.com/apache/arrow/blob/6db88a9e946c98c59f179210a70bc05ef6a0a296/cpp/src/arrow/array/util.cc#L472
Attachments
Issue Links
- links to