Uploaded image for project: 'Commons Math'
  1. Commons Math
  2. MATH-607

Current Multiple Regression Object does calculations with all data incore. There are non incore techniques which would be useful with large datasets.

    XMLWordPrintableJSON

Details

    Description

      The current multiple regression class does a QR decomposition on the complete data set. This necessitates the loading incore of the complete dataset. For large datasets, or large datasets and a requirement to do datamining or stepwise regression this is not practical. There are techniques which form the normal equations on the fly, as well as ones which form the QR decomposition on an update basis. I am proposing, first, the specification of an "UpdatingLinearRegression" interface which defines basic functionality all such techniques must fulfill.

      Related to this 'updating' regression, the results of running a regression on some subset of the data should be encapsulated in an immutable object. This is to ensure that subsequent additions of observations do not corrupt or render inconsistent parameter estimates. I am calling this interface "RegressionResults".

      Once the community has reached a consensus on the interface, work on the concrete implementation of these techniques will take place.

      Thanks,

      -Greg

      Attachments

        1. updating_reg_ifaces
          11 kB
          greg sterijevski
        2. updating_reg_cut2
          17 kB
          greg sterijevski
        3. RegressResults2
          4 kB
          greg sterijevski
        4. regres_change1
          1.0 kB
          greg sterijevski
        5. millerregtest
          81 kB
          greg sterijevski
        6. millerreg_take2
          18 kB
          greg sterijevski
        7. millerreg
          35 kB
          greg sterijevski

        Activity

          People

            Unassigned Unassigned
            gsteri1 greg sterijevski
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - 840h
                840h
                Remaining:
                Remaining Estimate - 840h
                840h
                Logged:
                Time Spent - Not Specified
                Not Specified