Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
Description
Comparing with scipy.stats.mode, arrow mode kernel is much slower in some conditions. See below example.
In [1]: import numpy as np In [2]: import scipy.stats In [3]: import pyarrow.compute as pc In [4]: f = np.random.rand(12345678) In [5]: time scipy.stats.mode(f) CPU times: user 1.14 s, sys: 111 ms, total: 1.25 s Wall time: 1.25 s Out[5]: ModeResult(mode=array([2.25710692e-08]), count=array([1])) In [6]: time pc.mode(f)[0] CPU times: user 8.44 s, sys: 338 ms, total: 8.77 s Wall time: 8.77 s Out[6]: <pyarrow.StructScalar: {'mode': 2.2571069235866048e-08, 'count': 1}> In [7]: i = np.random.randint(0, 1234567, 12345678) In [8]: time scipy.stats.mode(i) CPU times: user 1.03 s, sys: 3.11 ms, total: 1.03 s Wall time: 1.03 s Out[8]: ModeResult(mode=array([607002]), count=array([28])) In [9]: time pc.mode(i)[0] CPU times: user 1.57 s, sys: 0 ns, total: 1.57 s Wall time: 1.57 s Out[9]: <pyarrow.StructScalar: {'mode': 607002, 'count': 28}>
Attachments
Issue Links
- links to