Hive
  1. Hive
  2. HIVE-6664

Vectorized variance computation differs from row mode computation.

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.13.0
    • Component/s: None
    • Labels:
      None

      Description

      Following query can show the difference:
      select var_samp(ss_sales_price), var_pop(ss_sales_price), stddev_pop(ss_sales_price), stddev_samp(ss_sales_price) from store_sales.

      The reason for the difference is that row mode converts the decimal value to double upfront to calculate sum of values, when computing variance. But the vector mode performs local aggregate sum as decimal and converts into double only at flush.

      1. HIVE-6664.1.patch
        11 kB
        Jitendra Nath Pandey
      2. HIVE-6664.1.patch
        11 kB
        Jitendra Nath Pandey
      3. HIVE-6664.1.patch
        11 kB
        Jitendra Nath Pandey

        Activity

        No work has yet been logged on this issue.

          People

          • Assignee:
            Jitendra Nath Pandey
            Reporter:
            Jitendra Nath Pandey
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development