Details

Bug

Status: Resolved

Major

Resolution: Fixed

None
Description
From below test, summation kernel is of lower precision than numpy.sum.
Numpy implements pairwise summation [1] with O(logn) roundoff error, better than O(n) error from naive summation.
sum.py
import numpy as np import pyarrow.compute as pc t = np.arange(321000, dtype='float64') t2 = t  np.mean(t) t2 *= t2 print('numpy sum:', np.sum(t2)) print('arrow sum:', pc.sum(t2))
test result
# Verified with wolfram alpha (arbitrary precision), Numpy's result is correct. $ ARROW_USER_SIMD_LEVEL=SSE4_2 python sum.py numpy sum: 2756346749973250.0 arrow sum: 2756346749973248.0 $ ARROW_USER_SIMD_LEVEL=AVX2 python sum.py numpy sum: 2756346749973250.0 arrow sum: 2756346749973249.0
Attachments
Issue Links
 relates to

ARROW11567 [C++][Compute] Variance kernel has precision issue
 Resolved
 links to