Details
-
New Feature
-
Status: Resolved
-
Major
-
Resolution: Incomplete
-
None
-
None
Description
Currently Mllib modeling tools work only with double data. However, data tables in practice often have a set of categorical fields (factors in R), that need to be converted to a set of 0/1 indicator variables (making the data actually used in a modeling algorithm completely numeric). In R, this is handled in modeling functions using the model.matrix function. Similar functionality needs to be available within Spark.