Description
It is convenient to implement data source API for LIBSVM format to have a better integration with DataFrames and ML pipeline API.
import org.apache.spark.ml.source.libsvm._ val training = sqlContext.read .format("libsvm") .option("numFeatures", "10000") .load("path")
This JIRA covers the following:
1. Read LIBSVM data as a DataFrame with two columns: label: Double and features: Vector.
2. Accept `numFeatures` as an option.
3. The implementation should live under `org.apache.spark.ml.source.libsvm`.
Attachments
Issue Links
- is depended upon by
-
SPARK-10518 Update code examples in spark.ml user guide to use LIBSVM data source instead of MLUtils
- Resolved
- is related to
-
SPARK-10537 Document LIBSVM data source options in public doc and minor improvements
- Resolved
- relates to
-
SPARK-11622 Make LibSVMRelation extends HadoopFsRelation and Add LibSVMOutputWriter
- Resolved
- links to