Details
-
Sub-task
-
Status: Resolved
-
Major
-
Resolution: Won't Fix
-
Impala 2.9.0
-
None
-
ghx-label-2
Description
Reducing the partition fanout would reduce the minimum memory requirement. The downside is that it can increase the number of required repartitions while spilling, or require spilling a bit more data (because the partition granularity is larger).
There's no effect on in-memory perf on TPC-H. On targeted perf there was one regression because the broadcast join spilled a partition:
+-----------+-----------------------+---------+------------+------------+----------------+ | Workload | File Format | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) | +-----------+-----------------------+---------+------------+------------+----------------+ | TPCH(_60) | parquet / none / none | 17.62 | +0.23% | 12.10 | -0.16% | +-----------+-----------------------+---------+------------+------------+----------------+ +-----------+----------+-----------------------+--------+-------------+------------+-----------+----------------+-------------+-------+ | Workload | Query | File Format | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%) | Base StdDev(%) | Num Clients | Iters | +-----------+----------+-----------------------+--------+-------------+------------+-----------+----------------+-------------+-------+ | TPCH(_60) | TPCH-Q3 | parquet / none / none | 11.88 | 11.36 | +4.56% | 0.02% | 0.39% | 1 | 5 | | TPCH(_60) | TPCH-Q21 | parquet / none / none | 68.54 | 67.43 | +1.64% | 1.60% | 0.33% | 1 | 5 | | TPCH(_60) | TPCH-Q11 | parquet / none / none | 2.44 | 2.40 | +1.60% | 5.36% | 3.88% | 1 | 5 | | TPCH(_60) | TPCH-Q7 | parquet / none / none | 36.05 | 35.49 | +1.59% | 0.42% | 0.27% | 1 | 5 | | TPCH(_60) | TPCH-Q20 | parquet / none / none | 7.86 | 7.80 | +0.80% | 0.77% | 0.58% | 1 | 5 | | TPCH(_60) | TPCH-Q19 | parquet / none / none | 10.84 | 10.79 | +0.49% | 1.57% | 1.54% | 1 | 5 | | TPCH(_60) | TPCH-Q22 | parquet / none / none | 6.49 | 6.46 | +0.38% | 0.83% | 0.48% | 1 | 5 | | TPCH(_60) | TPCH-Q8 | parquet / none / none | 12.21 | 12.18 | +0.27% | 0.67% | 1.31% | 1 | 5 | | TPCH(_60) | TPCH-Q5 | parquet / none / none | 9.64 | 9.62 | +0.24% | 0.45% | 0.43% | 1 | 5 | | TPCH(_60) | TPCH-Q15 | parquet / none / none | 10.94 | 10.92 | +0.18% | 0.38% | 0.60% | 1 | 5 | | TPCH(_60) | TPCH-Q1 | parquet / none / none | 27.71 | 27.70 | +0.05% | 0.53% | 0.64% | 1 | 5 | | TPCH(_60) | TPCH-Q18 | parquet / none / none | 56.23 | 56.29 | -0.11% | 0.94% | 0.55% | 1 | 5 | | TPCH(_60) | TPCH-Q10 | parquet / none / none | 13.50 | 13.53 | -0.18% | 0.50% | 1.86% | 1 | 5 | | TPCH(_60) | TPCH-Q17 | parquet / none / none | 16.45 | 16.55 | -0.60% | 1.06% | 0.82% | 1 | 5 | | TPCH(_60) | TPCH-Q12 | parquet / none / none | 9.33 | 9.39 | -0.62% | 1.04% | 1.53% | 1 | 5 | | TPCH(_60) | TPCH-Q9 | parquet / none / none | 36.26 | 36.59 | -0.92% | 0.63% | 0.31% | 1 | 5 | | TPCH(_60) | TPCH-Q6 | parquet / none / none | 4.58 | 4.63 | -1.04% | 0.79% | 0.47% | 1 | 5 | | TPCH(_60) | TPCH-Q4 | parquet / none / none | 7.86 | 7.96 | -1.31% | 0.29% | 0.81% | 1 | 5 | | TPCH(_60) | TPCH-Q13 | parquet / none / none | 21.97 | 22.33 | -1.60% | 0.19% | 0.67% | 1 | 5 | | TPCH(_60) | TPCH-Q2 | parquet / none / none | 3.22 | 3.29 | -2.23% | 3.87% | 0.77% | 1 | 5 | | TPCH(_60) | TPCH-Q14 | parquet / none / none | 8.06 | 8.25 | -2.33% | 1.83% | 3.19% | 1 | 5 | | TPCH(_60) | TPCH-Q16 | parquet / none / none | 5.47 | 5.69 | -4.02% | 0.85% | 1.35% | 1 | 5 | +-----------+----------+-----------------------+--------+-------------+------------+-----------+----------------+-------------+-------+
+--------------------+-----------------------+---------+------------+------------+----------------+ | Workload | File Format | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) | +--------------------+-----------------------+---------+------------+------------+----------------+ | TARGETED-PERF(_60) | parquet / none / none | 18.92 | +2.34% | 5.00 | +1.85% | +--------------------+-----------------------+---------+------------+------------+----------------+ +--------------------+--------------------------------------------------------+-----------------------+--------+-------------+------------+------------+----------------+------- ------+-------+ | Workload | Query | File Format | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%) | Base StdDev(%) | Num Cl ients | Iters | +--------------------+--------------------------------------------------------+-----------------------+--------+-------------+------------+------------+----------------+------- ------+-------+ | TARGETED-PERF(_60) | primitive_exchange_broadcast | parquet / none / none | 57.73 | 39.60 | R +45.78% | 0.32% | 2.60% | 1 | 5 | | TARGETED-PERF(_60) | primitive_conjunct_ordering_1 | parquet / none / none | 0.11 | 0.09 | R +31.01% | * 21.46% * | * 26.37% * | 1 | 5 | | TARGETED-PERF(_60) | primitive_topn_bigint | parquet / none / none | 5.19 | 4.41 | +17.76% | * 14.63% * | 8.25% | 1 | 5 | | TARGETED-PERF(_60) | primitive_filter_in_predicate | parquet / none / none | 2.39 | 2.27 | +5.06% | 4.95% | 4.89% | 1 | 5 | | TARGETED-PERF(_60) | primitive_exchange_shuffle | parquet / none / none | 85.73 | 82.08 | +4.44% | 2.41% | 0.55% | 1 | 5 | | TARGETED-PERF(_60) | primitive_filter_string_non_selective | parquet / none / none | 1.89 | 1.84 | +2.56% | 1.27% | 1.16% | 1 | 5 | | TARGETED-PERF(_60) | primitive_groupby_bigint_lowndv | parquet / none / none | 3.85 | 3.77 | +2.11% | 2.73% | 1.72% | 1 | 5 | | TARGETED-PERF(_60) | primitive_filter_bigint_selective | parquet / none / none | 0.18 | 0.17 | +1.95% | 6.48% | 1.66% | 1 | 5 | | TARGETED-PERF(_60) | primitive_filter_decimal_selective | parquet / none / none | 1.66 | 1.63 | +1.74% | 1.88% | 0.17% | 1 | 5 | | TARGETED-PERF(_60) | PERF_STRING-Q3 | parquet / none / none | 3.76 | 3.70 | +1.59% | 0.62% | 0.62% | 1 | 5 | | TARGETED-PERF(_60) | primitive_filter_string_selective | parquet / none / none | 1.94 | 1.92 | +1.36% | 1.35% | 1.24% | 1 | 5 | | TARGETED-PERF(_60) | primitive_shuffle_join_union_all_with_groupby | parquet / none / none | 49.09 | 48.47 | +1.29% | 0.11% | 0.30% | 1 | 5 | | TARGETED-PERF(_60) | primitive_filter_decimal_non_selective | parquet / none / none | 1.70 | 1.68 | +1.28% | 2.00% | 1.80% | 1 | 5 | | TARGETED-PERF(_60) | primitive_filter_string_like | parquet / none / none | 14.83 | 14.66 | +1.17% | 0.01% | 0.14% | 1 | 5 |
I'm not considering reducing the aggregation fanout because there's quite a strong "pre-partitioning" effect between the pre-aggregation and merge aggregation - reducing the fanout led to big regressions on some large aggregations in my experiment:
Report Generated on 2017-04-07
Run Description: "Base: c24142491b3e140aac8c6818668472707d015cc7 vs Ref: e1395f670f30df823adf682101b00fc72e15bd71"
Cluster Name: UNKNOWN
Lab Run Info: UNKNOWN
Impala Version: impalad version 2.9.0-SNAPSHOT RELEASE ()
Baseline Impala Version: impalad version 2.9.0-SNAPSHOT RELEASE ()
+-----------+-----------------------+---------+------------+------------+----------------+
| Workload | File Format | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) |
+-----------+-----------------------+---------+------------+------------+----------------+
| TPCH(_60) | parquet / none / none | 18.84 | +3.78% | 12.28 | -0.19% |
+-----------+-----------------------+---------+------------+------------+----------------+
+-----------+----------+-----------------------+--------+-------------+------------+------------+----------------+-------------+-------+
| Workload | Query | File Format | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%) | Base StdDev(%) | Num Clients | Iters |
+-----------+----------+-----------------------+--------+-------------+------------+------------+----------------+-------------+-------+
| TPCH(_60) | TPCH-Q18 | parquet / none / none | 86.46 | 56.09 | R +54.15% | * 12.70% * | 0.60% | 1 | 5 |
| TPCH(_60) | TPCH-Q17 | parquet / none / none | 17.71 | 16.32 | +8.55% | 0.77% | 1.05% | 1 | 5 |
| TPCH(_60) | TPCH-Q2 | parquet / none / none | 3.21 | 3.13 | +2.73% | 5.51% | 3.46% | 1 | 5 |
| TPCH(_60) | TPCH-Q10 | parquet / none / none | 13.63 | 13.31 | +2.41% | 1.04% | 0.97% | 1 | 5 |
| TPCH(_60) | TPCH-Q15 | parquet / none / none | 10.99 | 10.86 | +1.12% | 0.29% | 0.49% | 1 | 5 |
| TPCH(_60) | TPCH-Q11 | parquet / none / none | 2.43 | 2.41 | +0.64% | 2.18% | 3.88% | 1 | 5 |
| TPCH(_60) | TPCH-Q14 | parquet / none / none | 8.07 | 8.05 | +0.15% | 1.59% | 1.00% | 1 | 5 |
| TPCH(_60) | TPCH-Q13 | parquet / none / none | 21.97 | 21.94 | +0.13% | 0.30% | 0.19% | 1 | 5 |
| TPCH(_60) | TPCH-Q20 | parquet / none / none | 7.80 | 7.80 | +0.03% | 0.42% | 0.63% | 1 | 5 |
| TPCH(_60) | TPCH-Q8 | parquet / none / none | 12.16 | 12.17 | -0.09% | 1.14% | 0.54% | 1 | 5 |
| TPCH(_60) | TPCH-Q5 | parquet / none / none | 9.64 | 9.68 | -0.44% | 0.54% | 0.48% | 1 | 5 |
| TPCH(_60) | TPCH-Q22 | parquet / none / none | 6.52 | 6.56 | -0.66% | 0.94% | 0.77% | 1 | 5 |
| TPCH(_60) | TPCH-Q12 | parquet / none / none | 9.30 | 9.37 | -0.81% | 0.19% | 1.81% | 1 | 5 |
| TPCH(_60) | TPCH-Q6 | parquet / none / none | 4.56 | 4.60 | -0.84% | 0.60% | 0.81% | 1 | 5 |
| TPCH(_60) | TPCH-Q1 | parquet / none / none | 27.55 | 27.84 | -1.03% | 1.32% | 0.92% | 1 | 5 |
| TPCH(_60) | TPCH-Q4 | parquet / none / none | 7.79 | 7.88 | -1.12% | 1.10% | 0.27% | 1 | 5 |
| TPCH(_60) | TPCH-Q21 | parquet / none / none | 67.65 | 69.84 | -3.14% | 0.25% | 0.13% | 1 | 5 |
| TPCH(_60) | TPCH-Q16 | parquet / none / none | 5.45 | 5.67 | -3.73% | 0.96% | 1.46% | 1 | 5 |
| TPCH(_60) | TPCH-Q9 | parquet / none / none | 33.72 | 36.25 | -6.96% | 0.54% | 0.27% | 1 | 5 |
| TPCH(_60) | TPCH-Q19 | parquet / none / none | 10.22 | 11.06 | -7.60% | 0.23% | 0.63% | 1 | 5 |
| TPCH(_60) | TPCH-Q3 | parquet / none / none | 11.86 | 13.32 | -10.96% | 0.62% | 1.00% | 1 | 5 |
| TPCH(_60) | TPCH-Q7 | parquet / none / none | 35.72 | 45.17 | I -20.93% | 0.47% | 6.56% | 1 | 5 |
+-----------+----------+-----------------------+--------+-------------+------------+------------+----------------+-------------+-------+
(R) Regression: TPCH(_60) TPCH-Q18 [parquet / none / none] (56.09s -> 86.46s [+54.15%])
+--------------+------------+--------+----------+------------+------------+--------+----------+------------+--------+--------+-----------+
| Operator | % of Query | Avg | Base Avg | Delta(Avg) | StdDev(%) | Max | Base Max | Delta(Max) | #Hosts | #Rows | Est #Rows |
+--------------+------------+--------+----------+------------+------------+--------+----------+------------+--------+--------+-----------+
| 13:AGGREGATE | 56.02% | 61.84s | 27.05s | +128.61% | 9.36% | 71.87s | 27.37s | +162.55% | 1 | 3.85K | 8.60M |
| 04:AGGREGATE | 15.87% | 17.52s | 23.23s | -24.59% | * 30.12% * | 26.50s | 23.43s | +13.09% | 1 | 90.00M | 86.02M |
| 05:HASH JOIN | 18.97% | 20.95s | 23.27s | -9.99% | * 14.01% * | 26.01s | 23.56s | +10.37% | 1 | 92.82K | 360.01M |
| 02:SCAN HDFS | 4.13% | 4.56s | 4.62s | -1.31% | 1.14% | 4.64s | 4.71s | -1.42% | 1 | 92.82K | 360.01M |
+--------------+------------+--------+----------+------------+------------+--------+----------+------------+--------+--------+-----------+
(I) Improvement: TPCH(_60) TPCH-Q7 [parquet / none / none] (45.17s -> 35.72s [-20.93%])
+--------------+------------+----------+----------+------------+-----------+----------+----------+------------+--------+---------+-----------+
| Operator | % of Query | Avg | Base Avg | Delta(Avg) | StdDev(%) | Max | Base Max | Delta(Max) | #Hosts | #Rows | Est #Rows |
+--------------+------------+----------+----------+------------+-----------+----------+----------+------------+--------+---------+-----------+
| 11:AGGREGATE | 4.63% | 1.75s | 1.80s | -2.69% | 0.30% | 1.76s | 1.86s | -5.38% | 1 | 4 | 1.64M |
| 10:HASH JOIN | 10.12% | 3.83s | 4.33s | -11.56% | 0.50% | 3.86s | 4.50s | -14.20% | 1 | 350.34K | 36.00M |
| 09:HASH JOIN | 5.43% | 2.06s | 2.52s | -18.44% | 1.58% | 2.11s | 2.66s | -20.51% | 1 | 109.37M | 36.00M |
| 08:HASH JOIN | 9.95% | 3.77s | 5.09s | -25.84% | 0.31% | 3.79s | 5.34s | -29.10% | 1 | 109.37M | 36.00M |
| 03:SCAN HDFS | 2.47% | 935.77ms | 947.71ms | -1.26% | 1.11% | 945.37ms | 969.54ms | -2.49% | 1 | 9.00M | 9.00M |
| 07:HASH JOIN | 34.60% | 13.11s | 19.43s | -32.54% | 0.51% | 13.21s | 21.49s | -38.55% | 1 | 109.37M | 36.00M |
| 14:EXCHANGE | 2.50% | 947.18ms | 931.62ms | +1.67% | 1.96% | 978.10ms | 948.27ms | +3.14% | 1 | 90.00M | 90.00M |
| 02:SCAN HDFS | 3.14% | 1.19s | 1.20s | -0.97% | 0.69% | 1.20s | 1.22s | -1.35% | 1 | 90.00M | 90.00M |
| 06:HASH JOIN | 23.24% | 8.81s | 9.67s | -8.92% | 0.52% | 8.87s | 10.16s | -12.71% | 1 | 109.37M | 36.00M |
| 00:SCAN HDFS | 2.32% | 879.94ms | 890.98ms | -1.24% | 1.29% | 891.75ms | 892.31ms | -0.06% | 1 | 600.00K | 600.00K |
+--------------+------------+----------+----------+------------+-----------+----------+----------+------------+--------+---------+-----------+
(V) Significant Variability: TPCH(_60) TPCH-Q18 [parquet / none / none] (0.60% -> 12.70%)
+--------------+------------+-----------+----------------+------------------+--------+--------+-----------+
| Operator | % of Query | StdDev(%) | Base StdDev(%) | Delta(StdDev(%)) | #Hosts | #Rows | Est #Rows |
+--------------+------------+-----------+----------------+------------------+--------+--------+-----------+
| 04:AGGREGATE | 15.87% | 30.12% | 0.65% | +4526.94% | 1 | 90.00M | 86.02M |
| 05:HASH JOIN | 18.97% | 14.01% | 0.98% | +1323.57% | 1 | 92.82K | 360.01M |
+--------------+------------+-----------+----------------+------------------+--------+--------+-----------+