Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
Impala 2.0, Impala 2.3.0, Impala 2.5.0, Impala 2.4.0, Impala 2.6.0, Impala 2.7.0, Impala 2.8.0, Impala 2.9.0, Impala 2.10.0, Impala 2.11.0, Impala 2.12.0
-
None
Description
When computing the per-host memory estimate for local aggregations, the planner does not take into account that data is randomly distributed across nodes leading to significant underestimation in some cases. The suggested fix is to use min(agg input cardinality, NDV * #hosts) as the per-node cardinality used for the per-node memory estimate.
Impact: In the query below, the planner significantly underestimates the per-node memory of agg node 03 to be 3.8GB but the actual is 24.77.
Query
select sum(l_extendedprice) / 7.0 as avg_yearly from lineitem, part where p_partkey = l_partkey and p_brand = 'Brand#23' and p_container = 'MED BOX' and l_quantity < ( select 0.2 * avg(l_quantity) from lineitem where l_partkey = p_partkey )
Plan
12:AGGREGATE [FINALIZE] | output: sum:merge(l_extendedprice) | hosts=20 per-host-mem=unavailable | tuple-ids=6 row-size=16B cardinality=1 | 11:EXCHANGE [UNPARTITIONED] | hosts=20 per-host-mem=unavailable | tuple-ids=6 row-size=16B cardinality=1 | 06:AGGREGATE | output: sum(l_extendedprice) | hosts=20 per-host-mem=10.00MB | tuple-ids=6 row-size=16B cardinality=1 | 05:HASH JOIN [RIGHT SEMI JOIN, PARTITIONED] | hash predicates: l_partkey = p_partkey | other join predicates: l_quantity < 0.2 * avg(l_quantity) | hosts=20 per-host-mem=125.18MB | tuple-ids=0,1 row-size=80B cardinality=29992141 | |--10:EXCHANGE [HASH(p_partkey)] | | hosts=20 per-host-mem=0B | | tuple-ids=0,1 row-size=80B cardinality=29992141 | | | 04:HASH JOIN [INNER JOIN, BROADCAST] | | hash predicates: l_partkey = p_partkey | | hosts=20 per-host-mem=58.30MB | | tuple-ids=0,1 row-size=80B cardinality=29992141 | | | |--09:EXCHANGE [BROADCAST] | | | hosts=20 per-host-mem=0B | | | tuple-ids=1 row-size=56B cardinality=1000000 | | | | | 01:SCAN HDFS [tpch_1000_decimal_parquet.part, RANDOM] | | partitions=1/1 files=40 size=6.38GB | | predicates: p_brand = 'Brand#23', p_container = 'MED BOX' | | table stats: 200000000 rows total | | column stats: all | | hosts=20 per-host-mem=264.00MB | | tuple-ids=1 row-size=56B cardinality=1000000 | | | 00:SCAN HDFS [tpch_1000_decimal_parquet.lineitem, RANDOM] | partitions=1/1 files=880 size=216.61GB | table stats: 5999989709 rows total | column stats: all | hosts=20 per-host-mem=264.00MB | tuple-ids=0 row-size=24B cardinality=5999989709 | 08:AGGREGATE [FINALIZE] | output: avg:merge(l_quantity) | group by: l_partkey | hosts=20 per-host-mem=167.89MB | tuple-ids=4 row-size=16B cardinality=200052064 | 07:EXCHANGE [HASH(l_partkey)] | hosts=20 per-host-mem=0B | tuple-ids=3 row-size=16B cardinality=200052064 | 03:AGGREGATE | output: avg(l_quantity) | group by: l_partkey | hosts=20 per-host-mem=3.28GB | tuple-ids=3 row-size=16B cardinality=200052064 | 02:SCAN HDFS [tpch_1000_decimal_parquet.lineitem, RANDOM] partitions=1/1 files=880 size=216.61GB table stats: 5999989709 rows total column stats: all hosts=20 per-host-mem=176.00MB tuple-ids=2 row-size=16B cardinality=5999989709
Summary
Operator | #Hosts | Avg Time | Max Time | #Rows | Est. #Rows | Peak Mem | Est. Peak Mem | Detail |
---|---|---|---|---|---|---|---|---|
12:AGGREGATE | 1 | 256.620ms | 256.620ms | 1 | 1 | 92.00 KB | -1.00 B | FINALIZE |
11:EXCHANGE | 1 | 184.430us | 184.430us | 20 | 1 | 0 | -1.00 B | UNPARTITIONED |
06:AGGREGATE | 20 | 364.045ms | 1s508ms | 20 | 1 | 9.37 MB | 10.00 MB | |
05:HASH JOIN | 20 | 279.175ms | 304.600ms | 523.09K | 29.99M | 155.04 MB | 125.18 MB | RIGHT SEMI JOIN, PARTITIONED |
I--10:EXCHANGE | 20 | 22.448ms | 32.954ms | 5.98M | 29.99M | 0 | 0 | HASH(p_partkey) |
I 04:HASH JOIN | 20 | 25s417ms | 35s579ms | 5.98M | 29.99M | 146.02 MB | 58.30 MB | INNER JOIN, BROADCAST |
I I--09:EXCHANGE | 20 | 16.270ms | 35.329ms | 199.30K | 1.00M | 0 | 0 | BROADCAST |
I I 01:SCAN HDFS | 20 | 218.505ms | 331.299ms | 199.30K | 1.00M | 173.43 MB | 264.00 MB | tpch_1000_decimal_parquet.part |
I 00:SCAN HDFS | 20 | 1s365ms | 1s822ms | 6.00B | 6.00B | 1.92 GB | 264.00 MB | tpch_1000_decimal_parquet.l... |
08:AGGREGATE | 20 | 29s706ms | 35s917ms | 200.00M | 200.05M | 1.64 GB | 167.89 MB | FINALIZE |
07:EXCHANGE | 20 | 5s081ms | 8s410ms | 3.11B | 200.05M | 0 | 0 | HASH(l_partkey) |
03:AGGREGATE | 20 | 4m10s 5m12s | 3.11B | 200.05M | 24.77 GB | 3.28 GB | ||
02:SCAN HDFS | 20 | 1s544ms | 2s517ms | 6.00B | 6.00B | 838.85 MB | 176.00 MB | tpch_1000_decimal_parquet.l... |
Attachments
Issue Links
- depends upon
-
IMPALA-6460 More flexible memory-based admission control policies
- Resolved
- is related to
-
IMPALA-9591 Automatically cap pre-aggregation memory consumption to match estimate
- Open
-
IMPALA-9765 Preaggregations should stop expanding if memory is scarce
- Open
-
IMPALA-5013 Re-evaluate our approach to per-operator memory estimates
- Resolved
-
IMPALA-10377 Improve the accuracy of resource estimation
- Resolved