Hive
  1. Hive
  2. HIVE-4160 Vectorized Query Execution in Hive
  3. HIVE-4612

Vectorized aggregates do not emit proper rows in presence of GROUP BY

    Details

      Description

      I discovered this while testing the fix for HIVE-4451 and HIVE-4452. The VGBy is emitting appropriate number of rows, but the row-mode ReduceSinkOperatoir only logs one row and the final result is incomplete. Investigating. Related to HIVE-4599.

      1. HIVE-4612.0.patch.txt
        6 kB
        Remus Rusanu
      2. HIVE-4612.1.patch.txt
        63 kB
        Remus Rusanu

        Activity

        Hide
        Remus Rusanu added a comment -

        Added HIVE-4652 for "VectorHashKeyWrapperBatch.java should be in vector package (instead of exec)". Thanks!

        Show
        Remus Rusanu added a comment - Added HIVE-4652 for "VectorHashKeyWrapperBatch.java should be in vector package (instead of exec)". Thanks!
        Hide
        Ashutosh Chauhan added a comment -

        Committed to branch. Thanks, Remus!

        Show
        Ashutosh Chauhan added a comment - Committed to branch. Thanks, Remus!
        Hide
        Ashutosh Chauhan added a comment -

        Not specific to this patch, but VectorHashKeyWrapperBatch.java should be in vector package (instead of exec). Can you file a follow-up jira to move that file?

        Show
        Ashutosh Chauhan added a comment - Not specific to this patch, but VectorHashKeyWrapperBatch.java should be in vector package (instead of exec). Can you file a follow-up jira to move that file?
        Hide
        Remus Rusanu added a comment -

        Add support for all types

        Show
        Remus Rusanu added a comment - Add support for all types
        Hide
        Remus Rusanu added a comment -

        HIVE-4603 commit breaks the current patch. I will follow through with a more comprehensive solution to support all primitive types GROUP BY keys are required to support in this release.

        Show
        Remus Rusanu added a comment - HIVE-4603 commit breaks the current patch. I will follow through with a more comprehensive solution to support all primitive types GROUP BY keys are required to support in this release.
        Show
        Remus Rusanu added a comment - https://reviews.apache.org/r/11427/
        Hide
        Remus Rusanu added a comment -

        This patch is on top on HIVE-4451

        Show
        Remus Rusanu added a comment - This patch is on top on HIVE-4451
        Hide
        Remus Rusanu added a comment -

        Fix provided for int key column. I'm convinced the same problem exists for all other type that differ from the supported vectorized types. Will need to revisit. I propose to address that, along with other issues, in HIVE-4604

        Show
        Remus Rusanu added a comment - Fix provided for int key column. I'm convinced the same problem exists for all other type that differ from the supported vectorized types. Will need to revisit. I propose to address that, along with other issues, in HIVE-4604
        Hide
        Remus Rusanu added a comment -

        The problem is that the BinaryWritable key emitted by VGBy does not match the original column type. Eg. if column is 'int' the VGBy will emit 'long', ie. 8 bytes vs. 4 bytes. The reduce side is reading the 4 bytes it knows about, hence the result corruption.

        Show
        Remus Rusanu added a comment - The problem is that the BinaryWritable key emitted by VGBy does not match the original column type. Eg. if column is 'int' the VGBy will emit 'long', ie. 8 bytes vs. 4 bytes. The reduce side is reading the 4 bytes it knows about, hence the result corruption.

          People

          • Assignee:
            Remus Rusanu
            Reporter:
            Remus Rusanu
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development