Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-3662

Reduce parquet scanner memory usage

    XMLWordPrintableJSON

Details

    Description

      After IMPALA-2736 there was an increase in peak memory consumption mostly due to the Parquet scanner.
      In most cases the Parquet scanner ends up buffering more batches than needed.

      In the attached profile the scanner memory increases from 2.17GB to 3.3GB.

      Workarounds
      The following query options may help to reduce scanner memory consumption:

      • Reduce the number of scanner threads (set num_scanner_threads=30)
      • Reduce the batch size (set batch_size=512)

      Of course, increasing the mem limit may also help.

      Attachments

        1. tpch_q14_2.5.txt
          278 kB
          Mostafa Mokhtar
        2. tpch_q14_2.6.txt
          281 kB
          Mostafa Mokhtar

        Activity

          People

            kwho Michael Ho
            mmokhtar Mostafa Mokhtar
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: