Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
In working on ARROW-2970, I have the following dataset:
values = [b'x'] + [ b'x' * (1 << 20) ] * 2 * (1 << 10) arr = np.array(values) arrow_arr = pa.array(arr)
The object arrow_arr has 129 chunks, each element of which is 1MB of binary. The repr for this object is over 600MB:
In [10]: rep = repr(arrow_arr) In [11]: len(rep) Out[11]: 637536258
There's probably a number of failsafes we can implement to avoid badness in these pathological cases (which may not happen often, but given the kinds of bug reports we are seeing, people do have datasets that look like this)
Attachments
Issue Links
- is a child of
-
ARROW-18359 PrettyPrint Improvements
- Open