Details
-
New Feature
-
Status: Resolved
-
Major
-
Resolution: Done
-
None
-
None
-
None
Description
This is originally proposed by Hossein Falaki.
This is a proposal for a new package within the Spark distribution to support common statistical estimators. We think consolidating statistical related functions in a separate package will help with readability of core source code and encourage spark users to submit back their functions.
Please see the initial design document here: https://docs.google.com/document/d/1Kju9kWSYMXMjEO6ggC9bF9eNbaM4MxcFs_KDqgAcH9c/pub
Attachments
Attachments
1.
|
Stratified sampling | Closed | Doris Xin | ||
2.
|
Correlations | Resolved | Doris Xin | ||
3.
|
Random RDD generator | Resolved | Doris Xin | ||
4.
|
Chi-squared test | Closed | Doris Xin | ||
5.
|
Bootstrapping | Resolved | Yu Ishikawa | ||
6.
|
Correlations (Pearson, Spearman) | Closed | Doris Xin | ||
7.
|
Ser/De for Double to enable calling Java API from python in MLlib | Resolved | Doris Xin | ||
8.
|
Python version of Random RDD without support for arbitrary distribution | Resolved | Doris Xin | ||
9.
|
Python correlations | Resolved | Doris Xin | ||
10.
|
Python support for chi-squared test | Closed | Davies Liu | ||
11.
|
colStats in Statistics as wrapper around MultivariateStatisticalSummary in Scala and Python | Resolved | Doris Xin |