Here is a post I published on dev mailing list. (paste it here)
Hi All Spark Devs,
I am Kai Jiang, a master student majoring in Computer Science. Machine Learning and Distributed
System are my interests. Due to that, I've been contributing to Spark codebase since last year. My
Pull Requests are related to MLlib, PySpark and SQL.(https://github.com/apache/spark/pulls/vectorijk)
Last time, I was impressed by the MechCoder's project mentored by mengxr. This year, I look forward
to having a chance to do something interesting and want to extend my future contribution with Spark
into a GSoC project. Thus, I was wondering if there are some specific ideas, issues or suggestions
regarding MLlib (mainly), SQL or others could be gathered into a project. After looking into the MLlib 2.0
Roadmap, I found there are many issues I am interested in (i.e Python/SparkR API for ML, PMML export,
etc.). If community has other ideas, I am very willing to work on some issues before GSoC.
I will put here a link of my very rough draft proposal later.