Issue 34093

Summary: Force linear regression curve to go through zero point (origin) or other Y-value defined
Product: General Reporter: denisl <dlebeuf>
Component: chartAssignee: AOO issues mailing list <issues>
Status: CONFIRMED --- QA Contact:
Severity: Trivial    
Priority: P3 CC: alek.email, belegdol, issues, jbf.faure, jumbo4444, kamataki, robert.pollak, tony.galmiche.ooo, weigel
Version: 3.3.0 or older (OOo)Keywords: ms_interoperability, oooqa, rfe_eval_ok
Target Milestone: ---   
Hardware: All   
OS: All   
Issue Type: FEATURE Latest Confirmation in: ---
Developer Difficulty: ---
Issue Depends on:    
Issue Blocks: 15522    
Attachments:
Description Flags
Screenshot from competitor. Define y axis intercept in options dialog of regression curve. none

Description denisl 2004-09-12 20:22:00 UTC
Whith 2 columns data such as

X Y
0 0
1 1
2 2
3 3

Select XY Chart as plot type, then Insert -> Statistics -> Linear Regression
There is no option as in the LINEST function to force a regression through the
zero point and graph the corresponding line.
Issue 33883 is somewhat related.
Comment 1 denisl 2004-09-12 20:24:52 UTC
Oups, The numerical example is not very good,

X Y
0 0
1 1
2 2.5
3 3
Would be better
Comment 2 kla 2004-09-17 13:48:54 UTC
HI Bettina,
one for you.
tk
Comment 3 kla 2004-09-17 13:50:21 UTC
reset Target
Comment 4 bettina.haberer 2004-11-29 18:23:29 UTC
Reassigned to Ingrid.
Comment 5 Matthias Basler 2005-03-29 12:59:28 UTC
I support this request. As scientifically working student I know that some
(linear!) regressions need to go through the origin, by definition. Currently it
is not possible to visualize these in charts.
Is this already incorporated into the proposal for the new chart engine?

One very simple solution is to just add an additional regression type
"Linear regression through origin" to the list of available regressions in the
data series dialog.
Comment 6 IngridvdM 2005-04-14 15:34:08 UTC
We should introduce this missing feature. Even more it should be possible to
force the regression line to go through a customizable value - not limitting us
to 0.
->bm: Please take care of this issue.
Comment 7 oooer 2006-07-11 21:00:39 UTC
"As scientifically working student" you should question if this a valid thing 
to do.

If a physical relationship is known to go through zero and your data does not, 
you should analyse why that is and write an appropriate conclusion not falsify 
your results by arbitarily forcing a false fit.

One common reason for this effect (apart from experimental error) is that least 
squares fit assumes that the y residuals are far greater than the x residuals. 
ie. x is a controlled variable with negligable error s.t. rx/ry -> 0 

If this is not the case, the derivation of the least squares line fit is NOT 
mathematically valid.

I have seen examples if dispersed data where the least squares fit was 
"visibly" wrong by a significant degree. The data did not fit the above 
requirement and the fit was garbage.


Most people are quite ignorant of this important limitation in using linear 
regression.

Least squares can also be strongly influenced by an erroneous data point (a 
flyer).

One further cause is that most implementations of least squares minimise the 
squares of the y residuals not the perpendiculars to the fitted line.


I see little use of this "feature" as suggested. If a user wants to add a 
fictive slope to his data he can simply add a line close to the lin. regression 
fit then remove the true line.

Adding this as a function will simply mislead non-technical users into thinking 
that this is an alternative VALID regression fit. It is not.

That would seem undersirable.

Comment 8 bjoern.milcke 2006-07-12 09:29:13 UTC
->oooer: Thanks for you comments. You are probably right, that forcing a
regression curve to go through a certain point might lead to wrong results
without the user noticing this.

Well, we have one vote, one comment here FOR having the feature and one AGAINST.
I must admit I do not have too much practical experience with regression, so I
cannot really vote for or against this feature. The only point that is a FOR is,
that MS has this feature in Excel (that's why this issue has the
interoperability keyword), thus we will need this feature for improving our
import filters.

Of course, we still have to decide about a UI for it. And maybe we can show a
warning dialog that tells a user that what he does might not be what he wants.

->oooer: Is there a way to analyze the data and find out if limiting the curve
to a certain point makes sense or not? I mean, I could imagine that if your
curve passes the point only by a very small amount (relative error), it might
not completely fake the result. Of course, the question still remains, why you
would want to adjust the curve to pass a certain point. It is only a regression
or trend curve, not something where you can read data of.
Comment 9 Matthias Basler 2006-07-12 18:01:53 UTC
Thanks for your thoughtful comment, oooer.

May I suggest following compromise: There SHOULD be a feature to have a graph
going through the point of origin, but it should NOT be termed "linear
regression", since it is none, as you pointed out. Maybe it could be termed
"Best linear fit though point of origin" or anything similar that makes a quick
point without being confused with linear regression or other inappropriate
statistical terms. (Suggestions welcome.)

Adding a line by hand is not a solution at all, since the whole "idea" of the
feature is, that the slope is calculated from the data. When drawing the line by
hand one would have to calculate the line's angle manually beforehand.
Beside that, "drawing" on diagrams is imho never a good idea, since the diagram
may change in size and content, so the drawn line wouldn't fit any more after
any change.
Comment 10 oooer 2006-07-14 19:38:53 UTC
@bm
hmm, I did not realise this was a cloning issue. I recognise the idea of 
maximising compatability of import filters but I would have hoped that rather 
than being a partial clone of a not-too-recent MSO, things like this would be 
where OpenOffice could distinguish itself as a superior product.

The other problem with copying MSO is that users will expect the _same_ false 
results or else they will complain the OpenOffice does not work correctly, so 
you will need to guess how they force the false zero.

There seem to be too different senarios: the import filter and if/how to add 
this functionality to Calc UI.

In the case of the import filter the idea of a dlg is good. I would favour 
quite explicit warning that this is included for compatability only and will 
produce an invalid fit and incorrect slope. Preferably with a help button 
taking them to some text that explains the limitations of lin.regression and 
the implications of cheating the fit. Once aware of the choice the user can 
decide what to do.


As a feature for Calc UI, I think ANY attempt at lin. regression should post a 
warning as to the limitations of the method that will be used and a link to 
more info (I see that as the duty of the program since the user cannot know how 
it is done and will assume in good faith that the methodology is valid).

@matthias 
>>Adding a line by hand is not a solution at all, since the whole "idea" of the
feature is, that the slope is calculated from the data.

That's just the problem, calculating a slope that comes out WROMG is not 
calculating from the data but gives the impression it is. It's worse than 
having to do a fit by eye because it has a undue bona fide of having been 
calculated by a computer.

Maybe it would make more sence to provide an easy means to fit a line by eye 
that will follow any resizing etc. in the same way as an applied regression 
formula apparently does and then show the formula.

I'm thinking of the way that gimp or blender let you create a line or bezier 
curve and pull it around until it fits. At least for the linear case this would 
be easy to program. Link it to the chart so that is scales and moves with it.

This will allow those interested in bending the truth to fit thier data to do 
what they need to do and also allow a valid means of fitting a line to data 
that do not fit the criteria for applying least squares analysis.

I'm not suggesting we make the use get out pen and paper here. The human brain 
is much better at seeing the "best fit" than an incorrectly applied bit of 
maths.

I cannot even imagine the ammount of innocently derived yet spurious 
"scientific results" caused by inappropriate use of linear regression. One 
major cause of this (appart from the quality of science education) is 
spreadsheet software that makes it as easy as clicking a button.

I think OpenOffice has a chance to innovate rather than follow here.

Thanks for your interest in my comments.







Comment 11 oooer 2006-07-14 20:41:32 UTC
>> since the diagram may change in size and content, so the drawn line wouldn't 
>> fit any more after any change.

@matthias
sorry, I did not understand your point until I reread this.

If a user fits a line by eye , he will know it is no longer valid if he changes 
the data. 

If he sets an automatic regression to inappropriate data it will never fit but 
he will believe it does. 

The former would seem to be the better choice.

Comment 12 Matthias Basler 2006-07-16 22:28:26 UTC
@oooer:
You see this from a strict scientifically point of view when you say "bend the
truth". I may point out that Office suites are used (and are to be used) by all
sorts of people, some of which might not even care about scientific statistics,
just want a nice fitting for visualization. 
Take f.e. a businessman that wants to visualize the mean growth of their
company's profits and which knows that it had 0 profits in, lets say, 2000,
because it the company was just founded this year. He/she wants a diagram with a
line going through the "(2000, 0)" point simply for the look of it.

Also sometimes constraints such as "fixed points" simply exist. Sometimes it is
better for a computer model to prefer a slightly wrong relationship formula over
the correct regression, simply to avoid computation side effects and artefacts,
or to simplify calculation. You'd maybe call this unscientifically, but such
tradeoffs  are imho justified in science - as long as they are documented.
(I won't go deeper in this - this mailing list is the wrong place for such debates.)

Specifically to your comments:
- Having warning dialogs if the user does something "unscienfically" would be
very appropriate in a scientific software like an actual statistics program. For
a "general" office suite I find this out of scope. Imho OOo cannot (and should
not even try) to find out when, lets say, diagramm types are inappropriate,
diagram colors are hard to tell apart, formulae are nonsense or anything of this
sort.

> - "The human brain is much better at seeing the "best fit" than an incorrectly
> applied bit of maths."
You seem to have a much better brain for this, since with my brain I would not
dare to fit a line manually, except for a rough sketch. ;-)
But seriously, people don't use OOo's diagram features to "paint". And I don't
like this "penalty" approach: "If you are not doing strict statistics, use OOo
Draw and paint your line by hand."
Comment 13 IngridvdM 2006-11-06 21:34:41 UTC
*** Issue 71191 has been marked as a duplicate of this issue. ***
Comment 14 belegdol 2007-11-06 19:34:43 UTC
Any updates on this? REGLINP already supports a fixed intercept, so...
Comment 15 belegdol 2007-11-06 20:51:05 UTC
Umm, I mean LINEST. /me loves translating the functions.
Comment 16 IngridvdM 2008-07-03 11:39:22 UTC
change owner
Comment 17 paul8paul 2008-07-03 13:20:47 UTC
I agree.  This is very important function for me as a science student. I can't
stop using Microsoft Office until I can do this in Open Office - at least for
linear regression.  In fact in the interest of flexibility I think you should be
able to choose exactly where it intersects either axis.

I know there are scientific arguments for and against this function but what
difference does it make who is scientifically right or wrong.  If this is a
function that users want to use then Open Office should be able to do it as
simply and quickly as possible.
Comment 18 paul8paul 2008-07-03 13:26:44 UTC
Its also necessary to be able to extrapolate the line as much as desired in
either direction.
Comment 19 IngridvdM 2008-07-22 14:21:34 UTC
reset to new
Comment 20 Regina Henschel 2008-11-23 01:13:54 UTC
*** Issue 96461 has been marked as a duplicate of this issue. ***
Comment 21 dskene 2008-11-23 03:21:16 UTC
There are many comments about this not being important as it is "faking data". 
That simply isn't the case. In the real world and not just on paper, there are 
error tolerances in what can be measured and they are often not exact. This is 
why you need a trendline in the first place. If you have only a small number of 
points there is a higher probability of the fit not meeting reality. Now, those 
who suggest that the data really does go through a different y intercept there 
are examples where it simply must be through a given point, such as zero. For 
example, if you plot concentration versus light absorbance then zero 
concentration must have zero absorbance. Extension of a spring might be another 
example. If there is no mass on the spring it needs to have a zero extension. 
The data in these examples would be MORE incorrect to have them pass through a 
y intercept other than zero.
Comment 22 Stefan Weigel 2008-12-12 20:12:46 UTC
We should not only be able to force the regression curve to go through zero, but
also through any other definable y axis intercept. I will attach a screenshot
from main competitor as an example (sorry only in german).
Comment 23 Stefan Weigel 2008-12-12 20:14:05 UTC
Created attachment 58788 [details]
Screenshot from competitor. Define y axis intercept in options dialog of regression curve.
Comment 24 shamran 2009-03-01 14:41:43 UTC
I have used OO for a number of years, and this feature is one I Keep missing
every so often. So please find the resources to implement this feature.
Comment 25 lka 2010-05-13 10:21:59 UTC
Same as shamran (and totally agree with dskene's post).
So UP!
Comment 26 jumbo444 2010-09-27 08:55:00 UTC
Hello,

I am pleased to inform you that Marcin Gutman has build an extension based on
Laurent Godard's macro CorelPoly. It can be downloaded at
http://extensions.services.openoffice.org/node/4387?

It is not the solution of this issue but it looks, from my point of view, has a
good workaround. In addition to polynomial regression, it also enable "force
intercept" option.

Please add your comments on this extension NOT HERE, but on extension web-page.
Comments and users feedback are welcome.
Next versions will include more robust method thanks to LINEST function, and
some translation.
Comment 27 jumbo444 2010-12-22 08:36:47 UTC
Hello,

CorelPolyGUI extension has been updated
http://extensions.services.openoffice.org/en/node/4634

It contains more robust method (QR decomposition), a simpler way to use "force
intercept" in case of linear problem, and translation in French and Polish.
Comment 28 jumbo444 2010-12-23 21:06:08 UTC
Sorry for the broken link. It is better to use:
http://extensions.services.openoffice.org/en/project/CorelPolyGUI
Latest version 0.3.2 corrects minor bugs.
Comment 29 Laurent BP 2011-10-04 08:55:13 UTC
Hello,

New version of CorelPoly (v 1.0!) available on
http://extensions-test.libreoffice.org/extension-center/improved-trend-lines/
and soon on OOo repository.
Now all classic regression are included, with force intercept option.
Comment 30 Edwin Sharp 2014-01-14 13:10:14 UTC
*** Issue 97439 has been marked as a duplicate of this issue. ***