Parquet dictionary decoders can accumulate throughout query execution. One is created per-column per-split. The decoder contains an vector of values for the dictionary that is not cleared when the scanner is finished with it.
I've attached a graph of memory usage when running this query on TPC-DS scale factor 100. Before is cdh5-trunk, and after is with a fix that delete the ColumnReader objects after each input split.