Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-6664

Vectorized variance computation differs from row mode computation.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 0.13.0
    • None
    • None

    Description

      Following query can show the difference:
      select var_samp(ss_sales_price), var_pop(ss_sales_price), stddev_pop(ss_sales_price), stddev_samp(ss_sales_price) from store_sales.

      The reason for the difference is that row mode converts the decimal value to double upfront to calculate sum of values, when computing variance. But the vector mode performs local aggregate sum as decimal and converts into double only at flush.

      Attachments

        1. HIVE-6664.1.patch
          11 kB
          Jitendra Nath Pandey
        2. HIVE-6664.1.patch
          11 kB
          Jitendra Nath Pandey
        3. HIVE-6664.1.patch
          11 kB
          Jitendra Nath Pandey

        Activity

          People

            jnp Jitendra Nath Pandey
            jnp Jitendra Nath Pandey
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: