Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-11758

[C++][Compute] Summation kernel round-off error

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 4.0.0
    • C++

    Description

      From below test, summation kernel is of lower precision than numpy.sum.
      Numpy implements pairwise summation [1] with O(logn) round-off error, better than O(n) error from naive summation.

      sum.py

      import numpy as np
      import pyarrow.compute as pc
      
      t = np.arange(321000, dtype='float64')
      t2 = t - np.mean(t)
      t2 *= t2
      
      print('numpy sum:', np.sum(t2))
      print('arrow sum:', pc.sum(t2))
      

      test result

      # Verified with wolfram alpha (arbitrary precision), Numpy's result is correct. 
      $ ARROW_USER_SIMD_LEVEL=SSE4_2 python sum.py
      numpy sum: 2756346749973250.0
      arrow sum: 2756346749973248.0
      
      $ ARROW_USER_SIMD_LEVEL=AVX2 python sum.py 
      numpy sum: 2756346749973250.0
      arrow sum: 2756346749973249.0
      

      [1] https://en.wikipedia.org/wiki/Pairwise_summation

      Attachments

        Issue Links

          Activity

            People

              yibocai#1 yibocai#1
              yibocai Yibo Cai
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 3.5h
                  3.5h