75279 – improvement for BINOMDIST

Issue 75279 - improvement for BINOMDIST

Summary: improvement for BINOMDIST

Status:	CLOSED FIXED

Alias:	None

Product:	Calc
Classification:	Application
Component:	code (show other issues)
Version:	OOo 2.2 RC3
Hardware:	All All

Importance:	P3 Trivial (vote)
Target Milestone:	3.4.0
Assignee:	kla
QA Contact:	issues@sc

URL:
Keywords:

Depends on:
Blocks:

Reported:	2007-03-10 20:09 UTC by Regina Henschel
Modified:	2017-05-20 10:30 UTC (History)
CC List:	2 users (show)

See Also:
Issue Type:	PATCH
Latest Confirmation in:	---
Developer Difficulty:	---

Attachments
spreadsheet including the macro (463.16 KB, application/vnd.oasis.opendocument.spreadsheet) 2007-03-10 20:10 UTC, Regina Henschel	no flags	Details
Screenshot to compare OOo with Gnumeric (111.97 KB, application/vnd.oasis.opendocument.spreadsheet) 2011-01-18 21:36 UTC, Regina Henschel	no flags	Details
Patch against DEV300m100, comments in issue and file testCases (13.64 KB, patch) 2011-02-20 21:19 UTC, Regina Henschel	no flags	Details \| Diff
commented test cases for B and BINOMDIST (319.98 KB, application/vnd.oasis.opendocument.spreadsheet) 2011-02-20 21:20 UTC, Regina Henschel	no flags	Details
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this issue.

Description Regina Henschel 2007-03-10 20:09:12 UTC

The function BINOMDIST has a very early intern overflow. For example
BINOMDIST(k;1458;0,4;0) gives an error #VALUE!

You can avoid this, if you optimize multiplication and division during calculation.
I'll attach a spreadsheet with a BASIC macro, where you can see how to do it.
Use "Tabelle2" to compare my macro with the build-in function. The macro is
included in the spreadsheet.

Besides, the macro has a better precision.

Comment 1 Regina Henschel 2007-03-10 20:10:41 UTC

Created attachment 43640 [details]
spreadsheet including the macro

Comment 2 oc 2008-07-15 10:43:24 UTC

reassigning features and enhancements to user requirements@openoffice.org which
will be the default owner for those tasks (was introduced some time ago)

Comment 3 Regina Henschel 2011-01-18 21:29:54 UTC

In the meantime the Beta function has been implemented. It can be used for
higher number of trials. For example for probability mass function:
PushDouble( exp( x*log(p)+(n-x)*log(q)-log(n+1.0)-GetLogBeta(n-x+1.0,x+1.0) ) );

Comment 4 Regina Henschel 2011-01-18 21:36:35 UTC

Created attachment 75585 [details]
Screenshot to compare OOo with Gnumeric

Comment 5 Regina Henschel 2011-01-18 21:40:48 UTC

The attached document shows some critical values as chart. The document contains
a picture with the same chart in Gnumeric, where can see, that it should be a
straight line.
(The spreadsheet contains a function TTT in which I have tested the mentioned
solution with the Beta function. That will give no results for you.)

Comment 6 ooo 2011-01-19 13:13:12 UTC

Nice, please go ahead :-)

This is not a feature requirement, reassigning to general spreadsheet owner.

Comment 7 Regina Henschel 2011-01-22 23:05:44 UTC

In theory it should work with
   PushDouble(GetBetaDistPDF(p, x+1.0, n-x+1.0)/(n+1));
in probability mass function case and with
   PushDouble(GetBetaDist(1.0-p,n-x,x+1));
in cumulative distribution function case. For probability mass function that
works well, but for cumulative distribution function there are some zeros, where
there should not be zeros, although this solution is already better than the
current implementation.
I suspect an underflow, likely in lcl_GetBetaHelperContFrac, but I have not
investigate it yet.

Comment 8 Regina Henschel 2011-02-20 21:19:52 UTC

Created attachment 75895 [details]
Patch against DEV300m100, comments in issue and file testCases

Comment 9 Regina Henschel 2011-02-20 21:20:48 UTC

Created attachment 75896 [details]
commented test cases for B and BINOMDIST

Comment 10 Regina Henschel 2011-02-20 21:25:17 UTC

Problems and solutions
======================
For illustrations see the attached document "testCases.ods".

1.
Getting zero in GetBetaDist doesn't come from lcl_GetBetaHelperContFrac but
because of missing an underflow case. I have added that case.

2.
When q^n and p^n both underflow, the value of BINOMDIST or B respectively is
calculated by approximation with beta distribution. This gives an accuracy of
about 13 decimal digits. This is the really new part in my patch. [Sheet "no value"]

3.
When p is near 1 then q=1-p has severe cancellation errors. But nothing can be
done, because when the interpreter gets the value p it is already in binary
double and the correct decimal value cannot be reconstructed. [Sheet "small n"]

4.
The current solution has a line fSum = 1.0-fFactor and than subtracts values in
a loop. This leads to errors, when fFactor< machine epsilon. I detect this case
and use sum up instead. [Sheet "constant value"]

5.
The sum up parts can result in values greater than 1. Such values are impossible
for cumulative distribution functions. I force the value to 1 in those cases.
[Sheet "illegal value"]

6.
The mass function is used inside BINOMDIST and in B. Therefore I separate it
into a method. It cannot be a local function because it uses GetBetaDistPDF.

7.
Summing up values of a range is used inside B and inside BINOMDIST. Therefore I
separate is into the local function lcl_GetBinomDistRange

8.
I have changed ULONG (now sal_uLong) to sal_int32. This will work-and has until
now-, because higher values of n will not reach the for-loop because p^n and q^n
will underflow.

9.
The current version calculates with denormalized values of fFactor. This gives
wrong results for fFactor near zero. I use the threshold
::std::numeric_limits<double>::min() instead. [Sheet "B range"]

10.
Calculations near the thresholds ::std::numeric_limits<double>::min() and
fMachEps might not be optimal. But I think, that it is not worth to detect
whether for example 1E-308 would be a better threshold than
::std::numeric_limits<double>::min().

11.
Calculation of cases B(n;p;n-k;n) are wrong in case q^n underflow. In this cases
the summand p^n is missing. I have rewritten the loops and initializations in
lcl_GetBinomDistRange. I use:
sum from j=xs to xe {(n choose j) * (q/p)^(n-j) * p^n}
=sum from j=xs to xe {(n choose n-j) * (q/p)^(n-j) * p^n}  substitute i=n-j
=sum from i=n-xe to n-xs { (n choose i) * (q/p)^i * p^n}
[Sheet "B range"]

Comment 11 ooo 2011-02-21 13:49:44 UTC

Thanks, grabbing issue.

Comment 12 ooo 2011-03-11 17:11:54 UTC

In cws calc66:

changeset e95dfb8df998 http://hg.services.openoffice.org/cws/calc66/changeset/e95dfb8df998
M sc/source/core/inc/interpre.hxx
M sc/source/core/tool/interpr3.cxx

You can observe the progress and possible integration date of CWS calc66 at
http://tools.services.openoffice.org/EIS2/cws.ShowCWS?Path=DEV300%2Fcalc66

Comment 13 ooo 2011-03-15 13:25:28 UTC

Reassigning to QA for verification.

Comment 14 kla 2011-03-21 16:18:28 UTC

thomas.klarhoefer@oracle.com