Details
-
Bug
-
Status: Resolved
-
Blocker
-
Resolution: Fixed
-
0.8.0
Description
When creating a large enough array, pyarrow raises an exception:
import numpy as np import pandas as pd import pyarrow as pa x = list('1' * 2**31) y = pd.DataFrame({'x': x}) t = pa.Table.from_pandas(y) # ArrowInvalid: BinaryArrow cannot contain more than 2147483646 bytes, have 2147483647
The array should be chunked for the user. As is, data frames with >2 GiB in binary data will struggle to get into arrow.
Attachments
Issue Links
- is related to
-
ARROW-3762 [C++] Parquet arrow::Table reads error when overflowing capacity of BinaryArray
- Resolved
- links to