Description
This JIRA is for computing stable covariance between two columns. The method `cov` should live under `df.stat` (similar to `na`).
df.stat.cov(col1, col2): Double
Stable algorithm: http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Covariance
UDAF support will be added later. Then users can do
df.groupBy("gender").agg(cov("age", "salary").as("cov_age_salary"))
Attachments
Issue Links
- is depended upon by
-
SPARK-7241 Pearson correlation for DataFrames
- Resolved
- links to