Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-2940

Parquet DictDecoders accumulate throughout query

    Details

      Description

      Parquet dictionary decoders can accumulate throughout query execution. One is created per-column per-split. The decoder contains an vector of values for the dictionary that is not cleared when the scanner is finished with it.

      I've attached a graph of memory usage when running this query on TPC-DS scale factor 100. Before is cdh5-trunk, and after is with a fix that delete the ColumnReader objects after each input split.

      use tpcds_100_parquet;                                                           
      set num_scanner_threads=8;
      select * from store_sales where ss_sold_time_sk = 12512434;
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                tarmstrong Tim Armstrong
                Reporter:
                tarmstrong Tim Armstrong
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: