Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.4.0
    • SQL
    • None

    Description

      This JIRA is for computing stable covariance between two columns. The method `cov` should live under `df.stat` (similar to `na`).

      df.stat.cov(col1, col2): Double
      

      Stable algorithm: http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Covariance

      UDAF support will be added later. Then users can do

      df.groupBy("gender").agg(cov("age", "salary").as("cov_age_salary"))
      

      Attachments

        Issue Links

          Activity

            People

              brkyvz Burak Yavuz
              mengxr Xiangrui Meng
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: