Description
This JIRA is for computing the Pearson linear correlation for two numerical columns in a DataFrame. The method `corr` should live under `df.stat`:
df.stat.corr(col1, col2, method="pearson"): Double
`method` will be used when we add other correlations.
Similar to SPARK-7240, UDAF will be added later.
Attachments
Issue Links
- depends upon
-
SPARK-7240 Covariance for DataFrames
- Resolved
- is depended upon by
-
SPARK-7245 Spearman correlation for DataFrames
- Resolved
- links to