Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-6664

Vectorized variance computation differs from row mode computation.

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.13.0
    • Component/s: None
    • Labels:
      None

      Description

      Following query can show the difference:
      select var_samp(ss_sales_price), var_pop(ss_sales_price), stddev_pop(ss_sales_price), stddev_samp(ss_sales_price) from store_sales.

      The reason for the difference is that row mode converts the decimal value to double upfront to calculate sum of values, when computing variance. But the vector mode performs local aggregate sum as decimal and converts into double only at flush.

        Attachments

        1. HIVE-6664.1.patch
          11 kB
          Jitendra Nath Pandey
        2. HIVE-6664.1.patch
          11 kB
          Jitendra Nath Pandey
        3. HIVE-6664.1.patch
          11 kB
          Jitendra Nath Pandey

          Activity

            People

            • Assignee:
              jnp Jitendra Nath Pandey
              Reporter:
              jnp Jitendra Nath Pandey
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: