Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
Description
I haven't looked carefully at the hot path for this, but I would expect these statements to have roughly the same performance (offloading the ndarray serialization to pickle)
In [1]: import pickle In [2]: import numpy as np In [3]: import pyarrow as pa a In [4]: arr = np.array(['foo', 'bar', None] * 100000, dtype=object) In [5]: timeit serialized = pa.serialize(arr).to_buffer() 10 loops, best of 3: 27.1 ms per loop In [6]: timeit pickled = pickle.dumps(arr) 100 loops, best of 3: 6.03 ms per loop
robertnishihara pcmoritz I encountered this while working on ARROW-1783, but it can likely be resolved independently
Attachments
Attachments
Issue Links
- is related to
-
ARROW-1784 [Python] Read and write pandas.DataFrame in pyarrow.serialize by decomposing the BlockManager rather than coercing to Arrow format
- Resolved
- links to