Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-10742

[Python] Mask not checked when creating array from numpy array

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • None
    • 3.0.0
    • Python

    Description

      When creating an array from a python sequence using a mask arrow will raise an exception unless:

      • mask is a numpy array
      • mask is dtype is bool
      • mask has same length as sequence
      • mask is 1 dimensional

      https://github.com/apache/arrow/blob/d542482bdc6bea8a449f000bdd74de8990c20015/cpp/src/arrow/python/iterators.h#L98-L124

      But, when creating an array from a numpy array these checks are not done which can lead to surprising results.

      Example:

      import pytest
      import pyarrow as pa
      import numpy as np
      
      
      def test_numpy_masked():
          # This test fails, because no exceptions are raised
          n = 100
          obj = np.arange(n)
          with pytest.raises(ValueError):
              arr = pa.array(obj, mask=np.array([None] * n, dtype="O"))  # wrong dtype
          with pytest.raises(ValueError):
              arr = pa.array(obj, mask=np.array([False] * (n // 2)))  # wrong length
          with pytest.raises(ValueError):
              arr = pa.array(obj, mask=np.array([False] * n, ndmin=2))  # wrong shape
      
      
      def test_sequence_masked():
          # This test passes, since exceptions are raised as expected
          n = 100
          obj = np.arange(n).tolist()
          with pytest.raises(ValueError):
              arr = pa.array(obj, mask=np.array([None] * n, dtype="O"))  # wrong dtype
          with pytest.raises(ValueError):
              arr = pa.array(obj, mask=np.array([False] * (n // 2)))  # wrong length
          with pytest.raises(ValueError):
              arr = pa.array(obj, mask=np.array([False] * n, ndmin=2))  # wrong shape
      
      
      if __name__ == "__main__":
          pytest.main(args=[__file__])
      
      

      Attachments

        Issue Links

          Activity

            People

              chrisavl Christian Lundgren
              chrisavl Christian Lundgren
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h
                  1h

                  Slack

                    Issue deployment