Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-17094

provide simplified API for ML pipeline

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Won't Fix
    • None
    • None
    • ML
    • None

    Description

      Many machine learning pipeline has the API for easily assembling transformers.

      One example would be:

      val model = new Pipeline("tokenizer", "countvectorizer", "lda").fit(data).
      

      Overall, the feature would
      1. Allow people (especially starters) to create a ML application in one simple line of code.
      2. And can be handy for users as they don't have to set the input, output columns.
      3. Thinking further, we may not need code any longer to build a Spark ML application as it can be done by configuration:

      "ml.pipeline.input": "hdfs://path.svm"
      "ml.pipeline": "tokenizer", "hashingTF", "lda"
      "ml.tokenizer.toLowercase": "false"
      ...
      

      , which can be quite efficient for tuning on cluster.

      Appreciate feedback and suggestions.

      Attachments

        Activity

          People

            Unassigned Unassigned
            yuhaoyan yuhao yang
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: