Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-2227

[Python] Table.from_pandas does not create chunked_arrays.

    Details

      Description

      When creating a large enough array, pyarrow raises an exception:

      import numpy as np
      import pandas as pd
      import pyarrow as pa
      
      x = list('1' * 2**31)
      y = pd.DataFrame({'x': x})
      t = pa.Table.from_pandas(y)
      # ArrowInvalid: BinaryArrow cannot contain more than 2147483646 bytes, have 2147483647

      The array should be chunked for the user. As is, data frames with >2 GiB in binary data will struggle to get into arrow.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                wesmckinn Wes McKinney
                Reporter:
                LeftScreenCorner Chris Ellison
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: