Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
ghx-label-4
Description
From a performance test on TPC-DS 3000 executed by rizaon we noticed that runtime filters are only applied at row level.
It is known that runtime filters are not applied at file/partition level on Iceberg tables (IMPALA-10453). But they could be applied at Parquet row group level. I think achieving this is much easier than fixing IMPALA-10453.
E.g. here is a snipped of the runtime profile of q49 of TPC-DS:
Filter 0 (8.00 KB) [108 instances]: - Files processed: 0 (0) - Files rejected: 0 (0) - Files total: 0 (0) - InactiveTotalTime: 0.000ns - RowGroups processed: 0 (0) - RowGroups rejected: 0 (0) - RowGroups total: 0 (0) - Rows processed: 19.34M (19335783) - Rows rejected: 19.32M (19323695) - Rows total: 20.00M (19999711) - Splits processed: 0 (0) - Splits rejected: 0 (0) - Splits total: 0 (0) - TotalTime: 0.000ns
We could save a lot of IO by applying the filters at row group level.
Attachments
Issue Links
- causes
-
IMPALA-13193 RuntimeFilter on parquet dictionary should evaluate null values
- Resolved
- relates to
-
IMPALA-5509 Runtime filter : Extend runtime filter to support Dictionary values
- Resolved