Uploaded image for project: 'Commons Math'
  1. Commons Math
  2. MATH-1482

Pull request for GLSMultipleLinearRegression

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Trivial
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      I would like to propose a pull request implementing an option to use variance vector instead of covariance matrix. It allows users to avoid unnecessary memory usage and excessive computation in case of uncorrelated but heteroscedastic errors thus making it possible to work with huge input matrices. Using variance vector in such cases allows to reduce time complexity from O(N^2) to just O(N) (where N is a number of observations) and dramatically reduce memory usage. For example, in my practice arose a need to train generalized linear model. Usage of Iteratively reweighted least squares algorithm requires weighted regression with more than a million observations. Current implementation would require approximately 12 terabytes of memory while patched version needs only 8 megabytes. Since IRLS is iterative algorithm a million-times complexity reduction is also pretty handy.

      https://github.com/apache/commons-math/pull/106

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              karl_crl Elena Kartysheva
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: