Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-6873

DISTINCT clause in aggregates is handled incorrectly by vectorized execution

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 0.13.0, 0.14.0
    • 0.13.0
    • Query Processor
    • None

    Description

      The vectorized aggregates ignore the DISTINCT clause. This cause incorrect results. Due to how GroupByOperatorDesc adds the DISTINCT keys to the overall aggregate keys the vectorized aggregates do account for the extra key, but they do not process the data correctly for the key. the reduce side the aggregates the input from the vectorized map side to results that are only sometimes correct but mostly incorrect. HIVE-4607 tracks the proper fix, but meantime I'm filing a bug to disable vectorized execution if DISTINCT is present. Fix is trivial.

      Attachments

        1. image004.png
          3 kB
          Remus Rusanu
        2. image003.png
          1 kB
          Remus Rusanu
        3. image002.png
          1.0 kB
          Remus Rusanu
        4. image001.png
          2 kB
          Remus Rusanu
        5. HIVE-6873.3.patch
          9 kB
          Jitendra Nath Pandey
        6. HIVE-6873.2.patch
          0.8 kB
          Remus Rusanu
        7. HIVE-6873.1.patch
          0.7 kB
          Remus Rusanu

        Activity

          People

            rusanu Remus Rusanu
            rusanu Remus Rusanu
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: