Details
Description
I believe PySpark's mllib module should support a GLM feature with also includes defining models using a formula. This is done in a python package called statsmodels http://statsmodels.sourceforge.net/devel/example_formulas.html
The formula feature can be implemented using the python module patsy.
Currently, RSpark supports a GLM module with formula feature.
I can give a shot implementing the feature.
Attachments
Issue Links
- Is contained by
-
SPARK-11106 Should ML Models contains single models or Pipelines?
- Resolved