Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-10494

.take silently overflow on list array (when casting to large_list is needed)

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.0.1, 2.0.0
    • None
    • Python
    • None

    Description

      reproducer below

      import numpy as np
      import pyarrow as pa
      arr = pa.array([np.arange(x).astype(np.int8) for x in range(6)])
      nb_repeat = 2**32 // arr.offsets.to_numpy()[-1]
      indices = pa.array(np.repeat(np.arange(len(arr)), nb_repeat))
      big_arr = arr.take(indices)
      print(big_arr.offsets[-5:])
      big_arr.validate() # hopefully this can catch it 
      
      [
        -21,
        -16,
        -11,
        -6,
        -1
      ]
      ---------------------------------------------------------------------------
      ArrowInvalid                              Traceback (most recent call last)
      <ipython-input-1-09503f9cbb04> in <module>
            6 big_arr = arr.take(indices)
            7 print(big_arr.offsets[-5:])
      ----> 8 big_arr.validate()
      
      /opt/conda/envs/model/lib/python3.7/site-packages/pyarrow/array.pxi in pyarrow.lib.Array.validate()
      
      /opt/conda/envs/model/lib/python3.7/site-packages/pyarrow/error.pxi in pyarrow.lib.check_status()
      
      ArrowInvalid: Negative offsets in list array
      

      and it works fine with large_array (as expected) :

      
      import numpy as np
      import pyarrow as pa
      arr = pa.array([np.arange(x).astype(np.int8) for x in range(6)], type=pa.large_list(pa.int8()))
      nb_repeat = 2**32 // arr.offsets.to_numpy()[-1]
      indices = pa.array(np.repeat(np.arange(len(arr)), nb_repeat))
      big_arr = arr.take(indices)
      print(big_arr.offsets[-5:])
      big_arr.validate()
      [
        4294967275,
        4294967280,
        4294967285,
        4294967290,
        4294967295
      ]
      

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              ArtemK Artem KOZHEVNIKOV
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: