Issue Details (XML | Word | Printable)

Key: MATH-278
Type: New Feature New Feature
Status: Closed Closed
Resolution: Fixed
Priority: Major Major
Assignee: Luc Maisonobe
Reporter: Eugene Kirpichov
Votes: 0
Watchers: 1
Operations

If you were logged in you would be able to see more operations.
Commons Math

Robust locally weighted regression (Loess / Lowess)

Created: 19/Jun/09 08:13 PM   Updated: 07/Aug/09 09:17 AM
Return to search
Component/s: None
Affects Version/s: None
Fix Version/s: None

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works loess.patch 2009-06-19 08:14 PM Eugene Kirpichov 25 kB
File Licensed for inclusion in ASF works loess.patch.v2 2009-06-20 05:13 AM Eugene Kirpichov 24 kB

Resolution Date: 20/Jun/09 01:45 PM


 Description  « Hide
Attached is a patch that implements the robust Loess procedure for smoothing univariate scatterplots with local linear regression ( http://en.wikipedia.org/wiki/Local_regression) described by William Cleveland in http://www.math.tau.ac.il/~yekutiel/MA%20seminar/Cleveland%201979.pdf , with tests.

(Also, the patch fixes one missing-javadoc checkstyle warning in the AbstractIntegrator class: I wanted to make it so that the code with my patch does not generate any checkstyle warnings at all)

I propose to include the procedure into commons-math because commons-math, as of now, does not possess a method for robust smoothing of noisy data: there is interpolation (which virtually can't be used for noisy data at all) and there's regression, which has quite different goals.
Loess allows one to build a smooth curve with a controllable degree of smoothness that approximates the overall shape of the data.

I tried to follow the code requirements as strictly as possible: the tests cover the code completely, there are no checkstyle warnings, etc. The code is completely written by myself from scratch, with no borrowings of third-party licensed code.

The method is pretty computationally intensive (10000 points with a bandwidth of 0.3 and 4 robustness iterations take about 3.7sec on my machine; generally the complexity is O(robustnessIters * n^2 * bandwidth)), but I don't know how to optimize it further; all implementations that I have found use exactly the same algorithm as mine for the unidimensional case.

Some TODOs, in vastly increasing order of complexity:

  • Make the weight function customizable: according to Cleveland, this is needed in some exotic cases only, like, where the desired approximation is non-continuous, for example.
  • Make the degree of the locally fitted polynomial customizable: currently the algorithm does only a linear local regression; it might be useful to make it also use quadratic regression. Higher degrees are not worth it, according to Cleveland.
  • Generalize the algorithm to the multidimensional case: this will require A LOT of hard work.


 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Eugene Kirpichov made changes - 19/Jun/09 08:14 PM
Field Original Value New Value
Attachment loess.patch [ 12411258 ]
Eugene Kirpichov made changes - 20/Jun/09 05:13 AM
Attachment loess.patch.v2 [ 12411301 ]
Luc Maisonobe made changes - 20/Jun/09 01:45 PM
Resolution Fixed [ 1 ]
Assignee Luc Maisonobe [ luc ]
Status Open [ 1 ] Resolved [ 5 ]
Luc Maisonobe made changes - 07/Aug/09 09:17 AM
Status Resolved [ 5 ] Closed [ 6 ]