Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-2940

Parquet DictDecoders accumulate throughout query

    XMLWordPrintableJSON

Details

    Description

      Parquet dictionary decoders can accumulate throughout query execution. One is created per-column per-split. The decoder contains an vector of values for the dictionary that is not cleared when the scanner is finished with it.

      I've attached a graph of memory usage when running this query on TPC-DS scale factor 100. Before is cdh5-trunk, and after is with a fix that delete the ColumnReader objects after each input split.

      use tpcds_100_parquet;                                                           
      set num_scanner_threads=8;
      select * from store_sales where ss_sold_time_sk = 12512434;
      

      Attachments

        1. mem-usage-dict-leak.png
          5 kB
          Tim Armstrong

        Issue Links

          Activity

            People

              tarmstrong Tim Armstrong
              tarmstrong Tim Armstrong
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: