Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-2367

[Python] ListArray has trouble with sizes greater than kMaximumCapacity

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 0.9.0
    • 2.0.0
    • Python
    • None

    Description

      When creating a Pandas dataframe with lists as elements as a column the following error occurs when converting to a pyarrow.Table object.

      Traceback (most recent call last):
      File "arrow-2227.py", line 16, in <module>
      arr = pa.array(df['strings'], from_pandas=True)
      File "array.pxi", line 177, in pyarrow.lib.array
      File "error.pxi", line 77, in pyarrow.lib.check_status
      File "error.pxi", line 77, in pyarrow.lib.check_status
      pyarrow.lib.ArrowInvalid: BinaryArray cannot contain more than 2147483646 bytes, have 2147483647
      

      The following code was used to generate the error (adapted from ARROW-2227):

      import pandas as pd
      import pyarrow as pa
      
      # Commented lines were used to test non-binary data types, both cause the same error
      v1 = b'x' * 100000000
      v2 = b'x' * 147483646
      # v1 = 'x' * 100000000
      # v2 = 'x' * 147483646
      
      df = pd.DataFrame({
           'strings': [[v1]] * 20 + [[v2]] + [[b'x']]
           # 'strings': [[v1]] * 20 + [[v2]] + [['x']]
      })
      arr = pa.array(df['strings'], from_pandas=True)
      assert isinstance(arr, pa.ChunkedArray), type(arr)
      

      Code was run using Python 3.6 with PyArrow installed from conda-forge on macOS High Sierra.

      Attachments

        Issue Links

          Activity

            People

              kszucs Krisztian Szucs
              bmenn Bryant Menn
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: