Issue Details (XML | Word | Printable)

Key: MATH-278
Type: New Feature New Feature
Status: Closed Closed
Resolution: Fixed
Priority: Major Major
Assignee: Luc Maisonobe
Reporter: Eugene Kirpichov
Votes: 0
Watchers: 1
Operations

If you were logged in you would be able to see more operations.
Commons Math

Robust locally weighted regression (Loess / Lowess)

Created: 19/Jun/09 08:13 PM   Updated: 07/Aug/09 09:17 AM
Return to search
Component/s: None
Affects Version/s: None
Fix Version/s: None

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works loess.patch 2009-06-19 08:14 PM Eugene Kirpichov 25 kB
File Licensed for inclusion in ASF works loess.patch.v2 2009-06-20 05:13 AM Eugene Kirpichov 24 kB

Resolution Date: 20/Jun/09 01:45 PM


 Description  « Hide
Attached is a patch that implements the robust Loess procedure for smoothing univariate scatterplots with local linear regression ( http://en.wikipedia.org/wiki/Local_regression) described by William Cleveland in http://www.math.tau.ac.il/~yekutiel/MA%20seminar/Cleveland%201979.pdf , with tests.

(Also, the patch fixes one missing-javadoc checkstyle warning in the AbstractIntegrator class: I wanted to make it so that the code with my patch does not generate any checkstyle warnings at all)

I propose to include the procedure into commons-math because commons-math, as of now, does not possess a method for robust smoothing of noisy data: there is interpolation (which virtually can't be used for noisy data at all) and there's regression, which has quite different goals.
Loess allows one to build a smooth curve with a controllable degree of smoothness that approximates the overall shape of the data.

I tried to follow the code requirements as strictly as possible: the tests cover the code completely, there are no checkstyle warnings, etc. The code is completely written by myself from scratch, with no borrowings of third-party licensed code.

The method is pretty computationally intensive (10000 points with a bandwidth of 0.3 and 4 robustness iterations take about 3.7sec on my machine; generally the complexity is O(robustnessIters * n^2 * bandwidth)), but I don't know how to optimize it further; all implementations that I have found use exactly the same algorithm as mine for the unidimensional case.

Some TODOs, in vastly increasing order of complexity:

  • Make the weight function customizable: according to Cleveland, this is needed in some exotic cases only, like, where the desired approximation is non-continuous, for example.
  • Make the degree of the locally fitted polynomial customizable: currently the algorithm does only a linear local regression; it might be useful to make it also use quadratic regression. Higher degrees are not worth it, according to Cleveland.
  • Generalize the algorithm to the multidimensional case: this will require A LOT of hard work.


 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Eugene Kirpichov added a comment - 19/Jun/09 08:14 PM
Attached the patch.

Eugene Kirpichov added a comment - 19/Jun/09 08:22 PM
Hm, I just understood that the moving weighted regression may be optimized from O(n^2) to O(n log n) by expressing it as a convolution with the weight function and using an FFT, probably. However, let's delay implementing that for some time..

Sebb added a comment - 19/Jun/09 08:53 PM
serialVersionUID should be private
The patch includes an unrelated change to AbstractIntegrator.java.

It would be useful to add a constructor which had parameters for bandwidth and roubustnessIterators, and drop the corresponding setxxx() methods
The fields could then be made final, and the class would then be immutable and thus thread-safe.

The SVN keyword $Date$ causes problems when checking releases, so I'd recommend that it is removed.


Eugene Kirpichov added a comment - 20/Jun/09 05:13 AM
Attached a patch that does not change the AbstractIntegrator class, the $Date$ argument is replaced with '???', and parameters are made final and initialized in two constructors. Tests and Javadocs updated accordingly.

Actually, I don't know what the $Revision$ and $Date$ are for and where they come from. Are they filled in automatically by a pre-commit hook? If so, should I leave them like '???' in the patch?
If I omit them altogether, I get a checkstyle error about the missing @version tag.


Eugene Kirpichov added a comment - 20/Jun/09 05:37 AM
(No, the FFT optimization may only be done if the abscissae are an arithmetic progression. Might make sense to include this as a special case in a separate method)

Sebb added a comment - 20/Jun/09 10:31 AM
SVN keywords are filled in when the code is fetched from the server (and stripped off on upload).

I meant to just remove $Date:$ from the code, leaving:

  • @version $Revision: $

however that can be done when the patch is applied.


Luc Maisonobe added a comment - 20/Jun/09 01:45 PM
solved in subversion repository as of r786821
applied patch with minor changes
thanks for the patch

Luc Maisonobe added a comment - 07/Aug/09 09:17 AM
closing resolved issue for 2.0 release