Details
-
Umbrella
-
Status: Resolved
-
Major
-
Resolution: Incomplete
-
None
-
None
Description
Similar to SPARK-10384, it would be nice to have bivariate statistics support in DataFrames (defined as UDAFs). This JIRA discuss general implementation and track subtasks. Bivariate statistics include:
- continuous: covariance (
SPARK-9297), Pearson's correlation (SPARK-9298), and Spearman's correlation (SPARK-10645) - categorical: ??
If we define them as UDAFs, it would be flexible to use them with DataFrames, e.g.,
df.groupBy("key").agg(corr("x", "y"))
Attachments
Issue Links
- relates to
-
SPARK-9297 covar_pop and covar_samp aggregate functions
- Resolved
-
SPARK-9298 corr aggregate functions
- Resolved
1.
|
Bivariate Statistics: Spearman's Correlation in DataFrames | Resolved | Unassigned | |
2.
|
Bivariate Statistics: Pearson's Chi-Squared goodness of fit test | Resolved | Unassigned | |
3.
|
Bivariate Statistics: Chi-Squared independence test | Resolved | Unassigned |