Details
-
Improvement
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
1.0.0
-
None
-
None
-
Windows
Pyarrow 1.0.0
Description
Wanted to report the performance difference observed between Pandas and Pyarrow.
import numpy as np import pandas as pd import pyarrow as pa import pyarrow.compute as pc df = pd.DataFrame(np.random.randn(100000000)) %timeit -n 5 -r 5 df.multiply(df) table = pa.Table.from_pandas(df) %timeit -n 5 -r 5 pc.multiply(table[0],table[0])
Results:
%timeit -n 5 -r 5 df.multiply(df) 374 ms ± 15.9 ms per loop (mean ± std. dev. of 5 runs, 5 loops each)``
%timeit -n 5 -r 5 pc.multiply(table[0],table[0]) 698 ms ± 297 ms per loop (mean ± std. dev. of 5 runs, 5 loops each)