Details

    • Type: Wish
    • Status: Resolved
    • Priority: Major
    • Resolution: Later
    • Affects Version/s: 0.13.1
    • Fix Version/s: None
    • Component/s: Algorithms
    • Labels:
      None

      Description

      Implement generalize Linear Models (GLM)

      https://en.wikipedia.org/wiki/Generalized_linear_model

        Issue Links

          Activity

          Hide
          kanjilal Saikat Kanjilal added a comment -

          More progress, added a successful unit test using the new GlmModel here: https://github.com/skanjila/mahout/blob/mahout-1929/math-scala/src/test/scala/org/apache/mahout/math/algorithms/GlmSuiteBase.scala

          Refactored the GlmModel code to leverage LinearRegressorModel instead: https://github.com/skanjila/mahout/blob/mahout-1929/math-scala/src/main/scala/org/apache/mahout/math/algorithms/regression/GlmModel.scala to make things a bit easier given the current architecture

          Trevor Grant any chance we could do a quick google hangout to discuss design before I get too far ahead? Would love some feedback to make sure this goes smoothly, I know you are busy with 0.13 so let me know what works

          Show
          kanjilal Saikat Kanjilal added a comment - More progress, added a successful unit test using the new GlmModel here: https://github.com/skanjila/mahout/blob/mahout-1929/math-scala/src/test/scala/org/apache/mahout/math/algorithms/GlmSuiteBase.scala Refactored the GlmModel code to leverage LinearRegressorModel instead: https://github.com/skanjila/mahout/blob/mahout-1929/math-scala/src/main/scala/org/apache/mahout/math/algorithms/regression/GlmModel.scala to make things a bit easier given the current architecture Trevor Grant any chance we could do a quick google hangout to discuss design before I get too far ahead? Would love some feedback to make sure this goes smoothly, I know you are busy with 0.13 so let me know what works
          Hide
          kanjilal Saikat Kanjilal added a comment -

          Ok more progress, I decided to take the strategy that scalation is taking in implementing GLM (https://github.com/scalation/scalation/blob/master/src/main/scala/scalation/analytics/par/GLM.scala), I have:
          1) Added a set of apply functions that reuse our existing infrastructure to compute the models that already exist, namely for Linear Regression and Ordinary Least Squares
          2) I have placed this code inside GLMModel for now

          Code is here: https://github.com/skanjila/mahout/blob/mahout-1929/math-scala/src/main/scala/org/apache/mahout/math/algorithms/regression/GlmModel.scala

          Would really appreciate a looksy from all interested parties identified in the above comments before I get too much farther , next steps include:
          1) unit tests for the apply functions
          2) figure out a creative way to tie in the other functions into the apply API infrastructure:

          def fit(drmX: DrmLike[K],
          drmTarget: DrmLike[K],
          hyperparameters: (Symbol, Any)*): GlmModel[K]

          def setStandardHyperparameters(hyperparameters: Map[Symbol, Any] = Map('foo -> None)): Unit

          def calculateStandardError[M[K] <: GlmModel[K]](X: DrmLike[K],
          drmTarget: DrmLike[K],
          drmXtXinv: Matrix,
          model: M[K]): M[K]

          def modelPostprocessing[M[K] <: GlmModel[K]](model: M[K],
          X: DrmLike[K],
          drmTarget: DrmLike[K],
          drmXtXinv: Matrix): M[K]

          Here are my thoughts on the above, my philosophy is that we keep around as much of the existing infrastructure that Trevor Grant has put in place and therefore these may need to just be implemented inside each of the derived classes, I would like to see the apply function eventually do the following: 1) compute the model 2) train the model 3) do the prediction 4) build quality measures around each of the models

          Thoughts ?

          Show
          kanjilal Saikat Kanjilal added a comment - Ok more progress, I decided to take the strategy that scalation is taking in implementing GLM ( https://github.com/scalation/scalation/blob/master/src/main/scala/scalation/analytics/par/GLM.scala ), I have: 1) Added a set of apply functions that reuse our existing infrastructure to compute the models that already exist, namely for Linear Regression and Ordinary Least Squares 2) I have placed this code inside GLMModel for now Code is here: https://github.com/skanjila/mahout/blob/mahout-1929/math-scala/src/main/scala/org/apache/mahout/math/algorithms/regression/GlmModel.scala Would really appreciate a looksy from all interested parties identified in the above comments before I get too much farther , next steps include: 1) unit tests for the apply functions 2) figure out a creative way to tie in the other functions into the apply API infrastructure: def fit(drmX: DrmLike [K] , drmTarget: DrmLike [K] , hyperparameters: (Symbol, Any)*): GlmModel [K] def setStandardHyperparameters(hyperparameters: Map [Symbol, Any] = Map('foo -> None)): Unit def calculateStandardError[M [K] <: GlmModel [K] ](X: DrmLike [K] , drmTarget: DrmLike [K] , drmXtXinv: Matrix, model: M [K] ): M [K] def modelPostprocessing[M [K] <: GlmModel [K] ](model: M [K] , X: DrmLike [K] , drmTarget: DrmLike [K] , drmXtXinv: Matrix): M [K] Here are my thoughts on the above, my philosophy is that we keep around as much of the existing infrastructure that Trevor Grant has put in place and therefore these may need to just be implemented inside each of the derived classes, I would like to see the apply function eventually do the following: 1) compute the model 2) train the model 3) do the prediction 4) build quality measures around each of the models Thoughts ?
          Hide
          kanjilal Saikat Kanjilal added a comment -

          Trevor Grant I've spent the past few days looking at already existing GLM implementations that would fit cleanly into the current architecture inside mahout (namely the LinearRegressorFitter and LienarRegressorModel etc), between this implementation that I already brought across https://github.com/BIDData/BIDMach/blob/master/src/main/scala/BIDMach/models/GLM.scala and this one https://github.com/scalation/scalation/blob/master/src/main/scala/scalation/analytics/par/GLM.scala I am leaning towards the latter, effectively we can mimic the structure of having apply functions inside GLM and keep our other infrastructure inside each of the Regression classes, this way we can reuse the functions already created inside RegressorFitter and RegressorModel, one idea to go along with this is to move these functions into GLMFitter and GLMModel and then have each of the derived regressor classes derive from this.

          Would like to get a general brainstorm input from you, Andrew Musselman and Jim Jagielski to avoid designing by 1 person syndrome

          Show
          kanjilal Saikat Kanjilal added a comment - Trevor Grant I've spent the past few days looking at already existing GLM implementations that would fit cleanly into the current architecture inside mahout (namely the LinearRegressorFitter and LienarRegressorModel etc), between this implementation that I already brought across https://github.com/BIDData/BIDMach/blob/master/src/main/scala/BIDMach/models/GLM.scala and this one https://github.com/scalation/scalation/blob/master/src/main/scala/scalation/analytics/par/GLM.scala I am leaning towards the latter, effectively we can mimic the structure of having apply functions inside GLM and keep our other infrastructure inside each of the Regression classes, this way we can reuse the functions already created inside RegressorFitter and RegressorModel, one idea to go along with this is to move these functions into GLMFitter and GLMModel and then have each of the derived regressor classes derive from this. Would like to get a general brainstorm input from you, Andrew Musselman and Jim Jagielski to avoid designing by 1 person syndrome
          Hide
          kanjilal Saikat Kanjilal added a comment - - edited

          Trevor Grant for simplicity's sake lets keep all work/conversations for GLM inside only this JIRA, mahout-1941 is linked to this and will be indirectly fixed when glm is in place, that being said I spent some time thinking about what you proposed and GLM seems like a superclass for linear as well as possion as well as logistic regression, this was the reason that I had GLM derive from Regressor as opposed to linear regressor, at the end of thie implementation linear regression will be one of the choices of glm but not necessarily the only end result, if it makes sense to reuse the functionality from linearregressor what do you think about creating a separate glmregressor that all the other regressors can derive from. I also added a stubbed out unit test here: https://github.com/skanjila/mahout/blob/mahout-1929/math-scala/src/test/scala/org/apache/mahout/math/algorithms/GlmSuiteBase.scala

          Thoughts?

          Show
          kanjilal Saikat Kanjilal added a comment - - edited Trevor Grant for simplicity's sake lets keep all work/conversations for GLM inside only this JIRA, mahout-1941 is linked to this and will be indirectly fixed when glm is in place, that being said I spent some time thinking about what you proposed and GLM seems like a superclass for linear as well as possion as well as logistic regression, this was the reason that I had GLM derive from Regressor as opposed to linear regressor, at the end of thie implementation linear regression will be one of the choices of glm but not necessarily the only end result, if it makes sense to reuse the functionality from linearregressor what do you think about creating a separate glmregressor that all the other regressors can derive from. I also added a stubbed out unit test here: https://github.com/skanjila/mahout/blob/mahout-1929/math-scala/src/test/scala/org/apache/mahout/math/algorithms/GlmSuiteBase.scala Thoughts?
          Hide
          kanjilal Saikat Kanjilal added a comment -

          Ok time to get down to business, Jim Jagielski and Aditya I've started the implementation here:

          https://github.com/skanjila/mahout/blob/mahout-1929/math-scala/src/main/scala/org/apache/mahout/math/algorithms/regression/GlmModel.scala

          I've brought over the GLM implementation from https://github.com/BIDData/BIDMach/blob/master/src/main/scala/BIDMach/models/GLM.scala and I've merged that with implementing the methods for the traits RegressorModel and RegressorFitter.

          Next steps:
          1) Take a look at the methods that I've commented out and come up with a way to incorporate them into the object model established by Trevor Grant
          2) Propose some ideas on the devlist or JIRA or all the different types of regression we want to support, to get this off the ground I would propose we just stick to linear and logistic regression
          3) Start stubbing out unit tests and check that into my branch so that we can collaborate

          Andrew MusselmanTrevor Grant would love some feedback on the general approach as well.
          Thanks

          Show
          kanjilal Saikat Kanjilal added a comment - Ok time to get down to business, Jim Jagielski and Aditya I've started the implementation here: https://github.com/skanjila/mahout/blob/mahout-1929/math-scala/src/main/scala/org/apache/mahout/math/algorithms/regression/GlmModel.scala I've brought over the GLM implementation from https://github.com/BIDData/BIDMach/blob/master/src/main/scala/BIDMach/models/GLM.scala and I've merged that with implementing the methods for the traits RegressorModel and RegressorFitter. Next steps: 1) Take a look at the methods that I've commented out and come up with a way to incorporate them into the object model established by Trevor Grant 2) Propose some ideas on the devlist or JIRA or all the different types of regression we want to support, to get this off the ground I would propose we just stick to linear and logistic regression 3) Start stubbing out unit tests and check that into my branch so that we can collaborate Andrew Musselman Trevor Grant would love some feedback on the general approach as well. Thanks
          Hide
          andrew.musselman Andrew Musselman added a comment -

          Sounds good; let's shoot for a point release soon if you can, or definitely for 0.14.

          Thanks!

          Show
          andrew.musselman Andrew Musselman added a comment - Sounds good; let's shoot for a point release soon if you can, or definitely for 0.14. Thanks!
          Hide
          kanjilal Saikat Kanjilal added a comment -

          Trevor Grant I was going to start working on this along with ALS, here's my thinking of the overall approach:

          1) Put up an API for this to the mahout dev list
          2) Start working on unit tests and then the code
          3) I browsed through the web and found this implementation, I figured it may give us some inspirational ideas: https://github.com/BIDData/BIDMach/blob/master/src/main/scala/BIDMach/models/GLM.scala, this implementation has a similar theme where they derive from a model and regressionModel and then use options for all the hyper parameters. One thought I had was to take the gist of this and replace all the matrix operations with samsara functions instead

          What do you think of the above plan?

          Show
          kanjilal Saikat Kanjilal added a comment - Trevor Grant I was going to start working on this along with ALS, here's my thinking of the overall approach: 1) Put up an API for this to the mahout dev list 2) Start working on unit tests and then the code 3) I browsed through the web and found this implementation, I figured it may give us some inspirational ideas: https://github.com/BIDData/BIDMach/blob/master/src/main/scala/BIDMach/models/GLM.scala , this implementation has a similar theme where they derive from a model and regressionModel and then use options for all the hyper parameters. One thought I had was to take the gist of this and replace all the matrix operations with samsara functions instead What do you think of the above plan?

            People

            • Assignee:
              Unassigned
              Reporter:
              rawkintrevo Trevor Grant
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development