Uploaded image for project: 'Commons Math'
  1. Commons Math
  2. MATH-278

Robust locally weighted regression (Loess / Lowess)

    Details

    • Type: New Feature
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None

      Description

      Attached is a patch that implements the robust Loess procedure for smoothing univariate scatterplots with local linear regression ( http://en.wikipedia.org/wiki/Local_regression) described by William Cleveland in http://www.math.tau.ac.il/~yekutiel/MA%20seminar/Cleveland%201979.pdf , with tests.

      (Also, the patch fixes one missing-javadoc checkstyle warning in the AbstractIntegrator class: I wanted to make it so that the code with my patch does not generate any checkstyle warnings at all)

      I propose to include the procedure into commons-math because commons-math, as of now, does not possess a method for robust smoothing of noisy data: there is interpolation (which virtually can't be used for noisy data at all) and there's regression, which has quite different goals.
      Loess allows one to build a smooth curve with a controllable degree of smoothness that approximates the overall shape of the data.

      I tried to follow the code requirements as strictly as possible: the tests cover the code completely, there are no checkstyle warnings, etc. The code is completely written by myself from scratch, with no borrowings of third-party licensed code.

      The method is pretty computationally intensive (10000 points with a bandwidth of 0.3 and 4 robustness iterations take about 3.7sec on my machine; generally the complexity is O(robustnessIters * n^2 * bandwidth)), but I don't know how to optimize it further; all implementations that I have found use exactly the same algorithm as mine for the unidimensional case.

      Some TODOs, in vastly increasing order of complexity:

      • Make the weight function customizable: according to Cleveland, this is needed in some exotic cases only, like, where the desired approximation is non-continuous, for example.
      • Make the degree of the locally fitted polynomial customizable: currently the algorithm does only a linear local regression; it might be useful to make it also use quadratic regression. Higher degrees are not worth it, according to Cleveland.
      • Generalize the algorithm to the multidimensional case: this will require A LOT of hard work.

        Attachments

        1. loess.patch
          25 kB
          Eugene Kirpichov
        2. loess.patch.v2
          24 kB
          Eugene Kirpichov

          Activity

            People

            • Assignee:
              luc Luc Maisonobe
              Reporter:
              jkff Eugene Kirpichov
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: