Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-24575

VectorGroupByOperator reusing keys can lead to wrong results

    XMLWordPrintableJSON

Details

    Description

       A common sql like

      select category as category, count(distinct maskdid) as uv from dwd_internal_inc_d group by category

      can have a wrong result on the trunk,  the result of column category can be confused and
      aggregate of distinct maskdid is also wrong. 
      After some debugging, We find that the problem is caused by wrong byteStarts[i] when using it to copy the current keys to the reusable keys: 
      https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/wrapper/VectorHashKeyWrapperGeneral.java#L351-L362
      The byteStarts[i] is always 0 due to Arrays.fill(byteStarts, 0); so it copies the range from 0 other then the real start index to len of the current keys to the reusable keys when clone.byteValues[i].length >= byteValues[i].length met, which results to the problem.
       
       

      Attachments

        Issue Links

          Activity

            People

              dengzh Zhihua Deng
              dengzh Zhihua Deng
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h
                  1h