Details
-
Improvement
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
None
Description
When creating an array from a python sequence using a mask arrow will raise an exception unless:
- mask is a numpy array
- mask is dtype is bool
- mask has same length as sequence
- mask is 1 dimensional
But, when creating an array from a numpy array these checks are not done which can lead to surprising results.
Example:
import pytest import pyarrow as pa import numpy as np def test_numpy_masked(): # This test fails, because no exceptions are raised n = 100 obj = np.arange(n) with pytest.raises(ValueError): arr = pa.array(obj, mask=np.array([None] * n, dtype="O")) # wrong dtype with pytest.raises(ValueError): arr = pa.array(obj, mask=np.array([False] * (n // 2))) # wrong length with pytest.raises(ValueError): arr = pa.array(obj, mask=np.array([False] * n, ndmin=2)) # wrong shape def test_sequence_masked(): # This test passes, since exceptions are raised as expected n = 100 obj = np.arange(n).tolist() with pytest.raises(ValueError): arr = pa.array(obj, mask=np.array([None] * n, dtype="O")) # wrong dtype with pytest.raises(ValueError): arr = pa.array(obj, mask=np.array([False] * (n // 2))) # wrong length with pytest.raises(ValueError): arr = pa.array(obj, mask=np.array([False] * n, ndmin=2)) # wrong shape if __name__ == "__main__": pytest.main(args=[__file__])
Attachments
Issue Links
- links to