Uploaded image for project: 'Commons Numbers'
  1. Commons Numbers
  2. NUMBERS-142

Improve LinearCombination accuracy during summation of the round-off errors



    • Type: Improvement
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: 1.0
    • Fix Version/s: None
    • Component/s: arrays
    • Labels:


      The algorithm in LinearCombination is an implementation of dot2 from Ogita el al (algorithm 5.3). There is a subtle difference in that the original dot2 algorithm sums the round-off from the products and the round-off from the product summation together. The method in LinearCombination sums them separately (using an extra variable) and adds them at the end. This actually improves the accuracy under conditions where the round-off is of greater sum than the products, as described below.

      The dot2 algorithm suffers when the total sum is close to 0 but the intermediate sum is far enough from zero that there is a large difference between the exponents of summed terms and the final result. In this case the sum of the round-off is more important than the sum of the products which due to massive cancellation is zero. The case is important for Complex numbers which require a computation of log1p(x^2+y^2-1) when x^2+y^2 is close to 1 such that log(1) would be ~zero yet the true logarithm is representable to very high precision.

      This can be protected against by using the dotK algorithm of Ogita et al with K>2. This saves all the round-off parts from each product and the running sum. These are subject to an error free transform that repeatedly adds adjacent pairs to generate a new split pair with a closer upper and lower part. Over time this will order the parts from low to high and these can be summed low first for an error free dot product.

      Using this algorithm with a single pass (K=3 for dot3) removes the cancellation error observed in the mentioned use case. Adding a single pass over the parts changes the algorithm from 25n floating point operations (flops) to 31n flops for the sum of n products.

      A second change for the algorithm is to switch to using Dekker's algorithm (Dekker, 1971) to split the number. This extracts two 26-bit mantissas from a 53-bit mantis (in IEEE 754 the leading bit in front of the of the 52-bit mantissa is assumed to be 1). This is done by multiplication by 2^s+1 with s = ceil(53/2) = 27:

      big = (2^s+1) * a
      a_hi = (big - (big - a))

      The extra bit of precision is carried into the sign of the low part of the split number.

      This is in contrast to the method in LinearCombination that uses a simple mask on the long representation to obtain the a_hi part in 26-bits and the lower part will be 27 bits.

      The advantage of Dekker's method is it produces 2 parts with 26 bits in the mantissa that can be multiplied exactly. The disadvantage is the potential for overflow requiring a branch condition to check for extreme values.

      It also appropriately handles very small sub-normal numbers that would be masked to create a 0 high part with all the non-zero bits left in the low part using the current method. This will have knock on effects on split multiplication which requires the high part to be larger.

      A simple change to the current implementation to use Dekker's split improves the precision on a wide range of test data (not shown).


        1. array_performance.jpg
          47 kB
          Alex Herbert
        2. cond_no.jpg
          27 kB
          Alex Herbert
        3. dot.jpg
          63 kB
          Alex Herbert
        4. error_vs_condition_no.jpg
          60 kB
          Alex Herbert
        5. inline_perfomance.jpg
          33 kB
          Alex Herbert
        6. linear_cached.jpg
          20 kB
          Alex Herbert
        7. linear_inline_vs_array.jpg
          25 kB
          Alex Herbert

          Issue Links



              • Assignee:
                aherbert Alex Herbert
                aherbert Alex Herbert
              • Votes:
                0 Vote for this issue
                2 Start watching this issue


                • Created:

                  Time Tracking

                  Original Estimate - Not Specified
                  Not Specified
                  Remaining Estimate - 0h
                  Time Spent - 1h 10m
                  1h 10m