Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-2227

[Python] Table.from_pandas does not create chunked_arrays.

    XMLWordPrintableJSON

Details

    Description

      When creating a large enough array, pyarrow raises an exception:

      import numpy as np
      import pandas as pd
      import pyarrow as pa
      
      x = list('1' * 2**31)
      y = pd.DataFrame({'x': x})
      t = pa.Table.from_pandas(y)
      # ArrowInvalid: BinaryArrow cannot contain more than 2147483646 bytes, have 2147483647

      The array should be chunked for the user. As is, data frames with >2 GiB in binary data will struggle to get into arrow.

      Attachments

        Issue Links

          Activity

            People

              wesm Wes McKinney
              LeftScreenCorner Left Screen
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: