Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-3455

ANSI CORR(X,Y) is incorrect

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 0.7.1, 0.8.0, 0.8.1, 0.9.0, 0.9.1, 0.10.0, 0.11.0, 0.12.0
    • 0.13.0
    • UDF
    • Hide
      the patch for the
      src/ql/src/java/org/apache/hadoop/hive/ql/udf/generic
      Show
      the patch for the src/ql/src/java/org/apache/hadoop/hive/ql/udf/generic
    • correlation UDAF

    Description

      A simple test with 2 collinear vectors returns a wrong result.
      The problem is the merge of variances, file:

      http://svn.apache.org/viewvc/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCorrelation.java?revision=1157222&view=markup

      lines:
      347: myagg.xvar += xvarB + (xavgA - xavgB) * (xavgA - xavgB) * myagg.count;
      348: myagg.yvar += yvarB + (yavgA - yavgB) * (yavgA - yavgB) * myagg.count;

      the correct merge should be like this:
      347: myagg.xvar += xvarB + (xavgA - xavgB) * (xavgA - xavgB) / myagg.count * nA * nB;
      348: myagg.yvar += yvarB + (yavgA - yavgB) * (yavgA - yavgB) / myagg.count * nA * nB;

      Attachments

        1. my.patch
          2 kB
          Maxim Bolotin
        2. HIVE3455.corrTest.tar.gz
          3 kB
          Jon Hartlaub
        3. HIVE-3455.1.patch.txt
          6 kB
          Navis Ryu

        Issue Links

          Activity

            People

              maximbo Maxim Bolotin
              maximbo Maxim Bolotin
              Votes:
              2 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: