Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-10117

Implement SQL data source API for reading LIBSVM data

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.6.0
    • ML
    • None

    Description

      It is convenient to implement data source API for LIBSVM format to have a better integration with DataFrames and ML pipeline API.

      import org.apache.spark.ml.source.libsvm._
      
      val training = sqlContext.read
        .format("libsvm")
        .option("numFeatures", "10000")
        .load("path")
      

      This JIRA covers the following:

      1. Read LIBSVM data as a DataFrame with two columns: label: Double and features: Vector.
      2. Accept `numFeatures` as an option.
      3. The implementation should live under `org.apache.spark.ml.source.libsvm`.

      Attachments

        Issue Links

          Activity

            People

              lewuathe Kai
              mengxr Xiangrui Meng
              Xiangrui Meng Xiangrui Meng
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: