Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-9941

Try ML pipeline API on Kaggle competitions

    XMLWordPrintableJSON

Details

    • Umbrella
    • Status: Resolved
    • Major
    • Resolution: Incomplete
    • None
    • None
    • ML

    Description

      This is an umbrella JIRA to track some fun tasks

      We have built many features under the ML pipeline API, and we want to see how it works on real-world datasets, e.g., Kaggle competition datasets (https://www.kaggle.com/competitions). We want to invite community members to help test. The goal is NOT to win the competitions but to provide code examples and to find out missing features and other issues to help shape the roadmap.

      For people who are interested, please do the following:

      1. Create a subtask (or leave a comment if you cannot create a subtask) to claim a Kaggle dataset.
      2. Use the ML pipeline API to build and tune an ML pipeline that works for the Kaggle dataset.
      3. Paste the code to gist (https://gist.github.com/) and provide the link here.
      4. Report missing features, issues, running times, and accuracy.

      Attachments

        Activity

          People

            mengxr Xiangrui Meng
            mengxr Xiangrui Meng
            Votes:
            4 Vote for this issue
            Watchers:
            18 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: