Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-20177

Vectorization: Reduce KeyWrapper allocation in GroupBy Streaming mode

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 3.2.0
    • Vectorization

    Description

      The streaming mode for VectorGroupBy allocates a large number of arrays due to VectorKeyHashWrapper::duplicateTo()

      Since the vectors can't be mutated in-place while a single batch is being processed, this operation can be cut by 1000x by allocating a streaming key at the end of the loop, instead of reallocating within the loop.

            for(int i = 0; i < batch.size; ++i) {
              if (!batchKeys[i].equals(streamingKey)) {
                // We've encountered a new key, must save current one
                // We can't forward yet, the aggregators have not been evaluated
                rowsToFlush[flushMark] = currentStreamingAggregators;
                if (keysToFlush[flushMark] == null) {
                  keysToFlush[flushMark] = (VectorHashKeyWrapper) streamingKey.copyKey();
                } else {
                  streamingKey.duplicateTo(keysToFlush[flushMark]);
                }
      
                currentStreamingAggregators = streamAggregationBufferRowPool.getFromPool();
                batchKeys[i].duplicateTo(streamingKey);
                ++flushMark;
              }
      

      The duplicateTo can be pushed out of the loop since there only one to truly keep a copy of is the last unique key in the VRB.

      The actual byte[] values within the keys are safely copied out by - VectorHashKeyWrapperBatch.assignRowColumn() which calls setVal() and not setRef().

      Attachments

        1. HIVE-20177.01.patch
          2 kB
          Gopal Vijayaraghavan
        2. HIVE-20177.WIP.patch
          2 kB
          Gopal Vijayaraghavan

        Activity

          People

            gopalv Gopal Vijayaraghavan
            gopalv Gopal Vijayaraghavan
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: