Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-3668

COR built-in function when atleast one of the coefficient values is NaN

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Patch Available
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 0.12.0, 0.11.1, 0.12.1
    • Fix Version/s: None
    • Component/s: internal-udfs
    • Labels:
      None
    • Patch Info:
      Patch Available

      Description

      When passing multiple column keys for Correlation analysis, if coefficient value of one of the combinations is NaN, then the value for all other combinations is not computed.

      Pearson Co-efficient value is NaN if all values for a given column are the same.

      Example:
      A = LOAD 'myData' USING org.apache.hcatalog.pig.HCatLoader();
      B = group A all;
      c = foreach B generate group, FLATTEN(COR((bag

      {tuple(double)}) A.col_1,(bag{tuple(double)}

      ) A.col_2, (bag

      {tuple(double)}) A.col_3, (bag{tuple(double)}

      ) A.col_4));

      If the value of pearson coefficient for col_1 and col_2 is NaN, then value of co-efficients for all combinations is NaN

      This is happening because of 'return null' statement in catch block on lines 157 and 235 in file org.apache.pig.builtin.COR.java
      If the catch block is removed, then the correlation analysis would continue for the remaining columns. (ApachePig 0.12.0)

        Attachments

        1. CORR.diff
          1.0 kB
          Hiten Java

          Activity

            People

            • Assignee:
              hitenjava Hiten Java
              Reporter:
              hitenjava Hiten Java
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: