Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-6823

Add a model.matrix like capability to DataFrames (modelDataFrame)

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Incomplete
    • None
    • None
    • ML, SparkR

    Description

      Currently Mllib modeling tools work only with double data. However, data tables in practice often have a set of categorical fields (factors in R), that need to be converted to a set of 0/1 indicator variables (making the data actually used in a modeling algorithm completely numeric). In R, this is handled in modeling functions using the model.matrix function. Similar functionality needs to be available within Spark.

      Attachments

        Activity

          People

            Unassigned Unassigned
            shivaram Shivaram Venkataraman
            Votes:
            1 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: