[NUMBERS-184] Reduce number of operations in Precision.equals using a maxUlps - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Trivial
Resolution: Fixed
Affects Version/s: 1.0
Fix Version/s: 1.1
Component/s: core
Labels:
None

Description

The Precision class has a method to test if two arguments are equal using a maximum number of representable float values between two arguments.

This is performed on the IEEE 754 bit layout of the floats. When the two inputs have opposite signs there is a lot of code to compute the distance of the values from the bit representation of +0.0 or -0.0. This is redundant. If the signs are opposite then the distance from the bit representation of 0.0 only requires dropping the sign bit from the bit representation. Here is an extract from the current method:

        final int xInt = Float.floatToRawIntBits(x);
        final int yInt = Float.floatToRawIntBits(y);

        final boolean isEqual;
        if (((xInt ^ yInt) & SGN_MASK_FLOAT) == 0) {
            // number have same sign, there is no risk of overflow
            isEqual = Math.abs(xInt - yInt) <= maxUlps;
        } else {
            // number have opposite signs, take care of overflow
            final int deltaPlus;
            final int deltaMinus;
            if (xInt < yInt) {
                deltaPlus  = yInt - POSITIVE_ZERO_FLOAT_BITS;
                deltaMinus = xInt - NEGATIVE_ZERO_FLOAT_BITS;
            } else {
                deltaPlus  = xInt - POSITIVE_ZERO_FLOAT_BITS;
                deltaMinus = yInt - NEGATIVE_ZERO_FLOAT_BITS;
            }            

            if (deltaPlus > maxUlps) {
                isEqual = false;
            } else {
                isEqual = deltaMinus <= (maxUlps - deltaPlus);
            }        
        }

The second branch can be simplified using bit masking.

            final int deltaPlus = xInt & Integer.MAX_VALUE;
            final int deltaMinus = yInt & Integer.MAX_VALUE;   
            isEqual = (long) deltaPlus + deltaMinus <= maxUlps;

For the float method overflow can be avoid by using a long to sum the two deltas eliminating a further branch condition.

An different optimisation can be performed for the double argument method. Since the ulp argument is an integer, when the signs are opposite then a NaN bit value would be at least (2047L << 52) above zero. Thus there is no need to check for NaN if the numbers are equal within the max ULPs and have opposite signs.

This optimisation could be made if using a short for the float equals method but would require breaking API changes and cannot be done. For reference the max difference for doubles is approximately 2^31 / 2^52 of the mantissa for double values with the same exponent. This is a relative error of approximately 4.77e-7.

Using a short for floats would be 2^15 / 2^24 of the mantissa for a relative error of approximately 3.8e-3. Using an int argument allows an extreme relative error of 1 when both arguments are the same sign, and an absolute error of more than Float.MAX_VALUE. It makes no sense to compare two float values with a maximum possible ULP difference of more than the range from zero to infinity.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Alex Herbert

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 15/Feb/22 17:16

Updated:: 01/Nov/22 11:00

Resolved:: 16/Feb/22 13:48