Ok more progress, I decided to take the strategy that scalation is taking in implementing GLM (https://github.com/scalation/scalation/blob/master/src/main/scala/scalation/analytics/par/GLM.scala), I have:

1) Added a set of apply functions that reuse our existing infrastructure to compute the models that already exist, namely for Linear Regression and Ordinary Least Squares

2) I have placed this code inside GLMModel for now

Code is here: https://github.com/skanjila/mahout/blob/mahout-1929/math-scala/src/main/scala/org/apache/mahout/math/algorithms/regression/GlmModel.scala

Would really appreciate a looksy from all interested parties identified in the above comments before I get too much farther , next steps include:

1) unit tests for the apply functions

2) figure out a creative way to tie in the other functions into the apply API infrastructure:

def fit(drmX: DrmLike[K],

drmTarget: DrmLike[K],

hyperparameters: (Symbol, Any)*): GlmModel[K]

def setStandardHyperparameters(hyperparameters: Map[Symbol, Any] = Map('foo -> None)): Unit

def calculateStandardError[M[K] <: GlmModel[K]](X: DrmLike[K],

drmTarget: DrmLike[K],

drmXtXinv: Matrix,

model: M[K]): M[K]

def modelPostprocessing[M[K] <: GlmModel[K]](model: M[K],

X: DrmLike[K],

drmTarget: DrmLike[K],

drmXtXinv: Matrix): M[K]

Here are my thoughts on the above, my philosophy is that we keep around as much of the existing infrastructure that Trevor Grant has put in place and therefore these may need to just be implemented inside each of the derived classes, I would like to see the apply function eventually do the following: 1) compute the model 2) train the model 3) do the prediction 4) build quality measures around each of the models

Thoughts ?

Trevor Grant I was going to start working on this along with ALS, here's my thinking of the overall approach:

1) Put up an API for this to the mahout dev list

2) Start working on unit tests and then the code

3) I browsed through the web and found this implementation, I figured it may give us some inspirational ideas: https://github.com/BIDData/BIDMach/blob/master/src/main/scala/BIDMach/models/GLM.scala, this implementation has a similar theme where they derive from a model and regressionModel and then use options for all the hyper parameters. One thought I had was to take the gist of this and replace all the matrix operations with samsara functions instead

What do you think of the above plan?