Details
Description
This is an umbrella JIRA for MechCoder's GSoC 2015 project. The main theme is to enhance MLlib's Python API, to make it on par with the Scala/Java API. The main tasks are:
1. For all models in MLlib, provide save/load method. This also
includes save/load in Scala.
2. Python API for evaluation metrics.
3. Python API for streaming ML algorithms.
4. Python API for distributed linear algebra.
5. Simplify MLLibPythonAPI using DataFrames. Currently, we use
customized serialization, making MLLibPythonAPI hard to maintain. It
would be nice to use the DataFrames for serialization.
I'll link the JIRAs for each of the tasks.
Note that this doesn't mean all these JIRAs are pre-assigned to MechCoder. The TODO list will be dynamic based on the backlog.
Attachments
Issue Links
- contains
-
SPARK-4118 Create python bindings for Streaming KMeans
- Resolved
-
SPARK-4127 Streaming Linear Regression- Python bindings
- Resolved
-
SPARK-5989 Model import/export for LDAModel
- Resolved
-
SPARK-7633 Streaming Logistic Regression- Python bindings
- Resolved
-
SPARK-7785 Add pretty printing to pyspark.mllib.linalg.Matrices
- Resolved
-
SPARK-7844 Broken tests in KernelDensity
- Resolved
-
SPARK-7946 DecayFactor wrongly set in StreamingKMeans
- Resolved
-
SPARK-8032 Make NumPy version checking in mllib/__init__.py
- Resolved
-
SPARK-6390 Add MatrixUDT in PySpark
- Resolved
-
SPARK-7639 Add Python API for Statistics.kernelDensity
- Resolved
-
SPARK-8265 Add LinearDataGenerator to pyspark.mllib.utils
- Resolved
-
SPARK-8140 Remove empty model check in StreamingLinearAlgorithm
- Resolved
-
SPARK-8291 Add parse functionality to LabeledPoint in PySpark
- Closed
- is related to
-
SPARK-13489 GSoC 2016 project ideas for MLlib
- Closed
- relates to
-
SPARK-6254 MLlib Python API parity check at 1.3 release
- Closed