Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-5304

Parquet scanner transfers decompression buffers when not needed

    Details

    • Epic Color:
      ghx-label-4

      Description

      The Parquet scanner always transfers decompression buffers to the scratch batch:

      Status BaseScalarColumnReader::ReadDataPage() {
        // We're about to move to the next data page.  The previous data page is
        // now complete, pass along the memory allocated for it.
        parent_->scratch_batch_->mem_pool()->AcquireData(decompressed_data_pool_.get(), false);
      

      These in turn are passed along with the row batch. This is safe but unnecessary in many cases where the batch does not hold pointers into the decompression buffer: if the column has only fixed-length data, or if the data page is dictionary-encoded.

      This can make problems like IMPALA-4923 worse than they would be otherwise because extra data is transferred across threads.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                tarmstrong Tim Armstrong
                Reporter:
                tarmstrong Tim Armstrong
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: