XMLWordPrintableJSON

    Details

    • Type: Sub-task
    • Status: Resolved
    • Priority: Major
    • Resolution: Won't Fix
    • Affects Version/s: Impala 2.9.0
    • Fix Version/s: None
    • Component/s: Backend

      Description

      Reducing the partition fanout would reduce the minimum memory requirement. The downside is that it can increase the number of required repartitions while spilling, or require spilling a bit more data (because the partition granularity is larger).

      There's no effect on in-memory perf on TPC-H. On targeted perf there was one regression because the broadcast join spilled a partition:

      +-----------+-----------------------+---------+------------+------------+----------------+
      | Workload  | File Format           | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) |
      +-----------+-----------------------+---------+------------+------------+----------------+
      | TPCH(_60) | parquet / none / none | 17.62   | +0.23%     | 12.10      | -0.16%         |
      +-----------+-----------------------+---------+------------+------------+----------------+
      
      +-----------+----------+-----------------------+--------+-------------+------------+-----------+----------------+-------------+-------+
      | Workload  | Query    | File Format           | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%) | Base StdDev(%) | Num Clients | Iters |
      +-----------+----------+-----------------------+--------+-------------+------------+-----------+----------------+-------------+-------+
      | TPCH(_60) | TPCH-Q3  | parquet / none / none | 11.88  | 11.36       |   +4.56%   |   0.02%   |   0.39%        | 1           | 5     |
      | TPCH(_60) | TPCH-Q21 | parquet / none / none | 68.54  | 67.43       |   +1.64%   |   1.60%   |   0.33%        | 1           | 5     |
      | TPCH(_60) | TPCH-Q11 | parquet / none / none | 2.44   | 2.40        |   +1.60%   |   5.36%   |   3.88%        | 1           | 5     |
      | TPCH(_60) | TPCH-Q7  | parquet / none / none | 36.05  | 35.49       |   +1.59%   |   0.42%   |   0.27%        | 1           | 5     |
      | TPCH(_60) | TPCH-Q20 | parquet / none / none | 7.86   | 7.80        |   +0.80%   |   0.77%   |   0.58%        | 1           | 5     |
      | TPCH(_60) | TPCH-Q19 | parquet / none / none | 10.84  | 10.79       |   +0.49%   |   1.57%   |   1.54%        | 1           | 5     |
      | TPCH(_60) | TPCH-Q22 | parquet / none / none | 6.49   | 6.46        |   +0.38%   |   0.83%   |   0.48%        | 1           | 5     |
      | TPCH(_60) | TPCH-Q8  | parquet / none / none | 12.21  | 12.18       |   +0.27%   |   0.67%   |   1.31%        | 1           | 5     |
      | TPCH(_60) | TPCH-Q5  | parquet / none / none | 9.64   | 9.62        |   +0.24%   |   0.45%   |   0.43%        | 1           | 5     |
      | TPCH(_60) | TPCH-Q15 | parquet / none / none | 10.94  | 10.92       |   +0.18%   |   0.38%   |   0.60%        | 1           | 5     |
      | TPCH(_60) | TPCH-Q1  | parquet / none / none | 27.71  | 27.70       |   +0.05%   |   0.53%   |   0.64%        | 1           | 5     |
      | TPCH(_60) | TPCH-Q18 | parquet / none / none | 56.23  | 56.29       |   -0.11%   |   0.94%   |   0.55%        | 1           | 5     |
      | TPCH(_60) | TPCH-Q10 | parquet / none / none | 13.50  | 13.53       |   -0.18%   |   0.50%   |   1.86%        | 1           | 5     |
      | TPCH(_60) | TPCH-Q17 | parquet / none / none | 16.45  | 16.55       |   -0.60%   |   1.06%   |   0.82%        | 1           | 5     |
      | TPCH(_60) | TPCH-Q12 | parquet / none / none | 9.33   | 9.39        |   -0.62%   |   1.04%   |   1.53%        | 1           | 5     |
      | TPCH(_60) | TPCH-Q9  | parquet / none / none | 36.26  | 36.59       |   -0.92%   |   0.63%   |   0.31%        | 1           | 5     |
      | TPCH(_60) | TPCH-Q6  | parquet / none / none | 4.58   | 4.63        |   -1.04%   |   0.79%   |   0.47%        | 1           | 5     |
      | TPCH(_60) | TPCH-Q4  | parquet / none / none | 7.86   | 7.96        |   -1.31%   |   0.29%   |   0.81%        | 1           | 5     |
      | TPCH(_60) | TPCH-Q13 | parquet / none / none | 21.97  | 22.33       |   -1.60%   |   0.19%   |   0.67%        | 1           | 5     |
      | TPCH(_60) | TPCH-Q2  | parquet / none / none | 3.22   | 3.29        |   -2.23%   |   3.87%   |   0.77%        | 1           | 5     |
      | TPCH(_60) | TPCH-Q14 | parquet / none / none | 8.06   | 8.25        |   -2.33%   |   1.83%   |   3.19%        | 1           | 5     |
      | TPCH(_60) | TPCH-Q16 | parquet / none / none | 5.47   | 5.69        |   -4.02%   |   0.85%   |   1.35%        | 1           | 5     |
      +-----------+----------+-----------------------+--------+-------------+------------+-----------+----------------+-------------+-------+
      
      +--------------------+-----------------------+---------+------------+------------+----------------+
      | Workload           | File Format           | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) |
      +--------------------+-----------------------+---------+------------+------------+----------------+
      | TARGETED-PERF(_60) | parquet / none / none | 18.92   | +2.34%     | 5.00       | +1.85%         |
      +--------------------+-----------------------+---------+------------+------------+----------------+
      
      +--------------------+--------------------------------------------------------+-----------------------+--------+-------------+------------+------------+----------------+-------
      ------+-------+
      | Workload           | Query                                                  | File Format           | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%)  | Base StdDev(%) | Num Cl
      ients | Iters |
      +--------------------+--------------------------------------------------------+-----------------------+--------+-------------+------------+------------+----------------+-------
      ------+-------+
      | TARGETED-PERF(_60) | primitive_exchange_broadcast                           | parquet / none / none | 57.73  | 39.60       | R +45.78%  |   0.32%    |   2.60%        | 1     
            | 5     |
      | TARGETED-PERF(_60) | primitive_conjunct_ordering_1                          | parquet / none / none | 0.11   | 0.09        | R +31.01%  | * 21.46% * | * 26.37% *     | 1     
            | 5     |
      | TARGETED-PERF(_60) | primitive_topn_bigint                                  | parquet / none / none | 5.19   | 4.41        |   +17.76%  | * 14.63% * |   8.25%        | 1     
            | 5     |
      | TARGETED-PERF(_60) | primitive_filter_in_predicate                          | parquet / none / none | 2.39   | 2.27        |   +5.06%   |   4.95%    |   4.89%        | 1     
            | 5     |
      | TARGETED-PERF(_60) | primitive_exchange_shuffle                             | parquet / none / none | 85.73  | 82.08       |   +4.44%   |   2.41%    |   0.55%        | 1     
            | 5     |
      | TARGETED-PERF(_60) | primitive_filter_string_non_selective                  | parquet / none / none | 1.89   | 1.84        |   +2.56%   |   1.27%    |   1.16%        | 1     
            | 5     |
      | TARGETED-PERF(_60) | primitive_groupby_bigint_lowndv                        | parquet / none / none | 3.85   | 3.77        |   +2.11%   |   2.73%    |   1.72%        | 1     
            | 5     |
      | TARGETED-PERF(_60) | primitive_filter_bigint_selective                      | parquet / none / none | 0.18   | 0.17        |   +1.95%   |   6.48%    |   1.66%        | 1     
            | 5     |
      | TARGETED-PERF(_60) | primitive_filter_decimal_selective                     | parquet / none / none | 1.66   | 1.63        |   +1.74%   |   1.88%    |   0.17%        | 1     
            | 5     |
      | TARGETED-PERF(_60) | PERF_STRING-Q3                                         | parquet / none / none | 3.76   | 3.70        |   +1.59%   |   0.62%    |   0.62%        | 1     
            | 5     |
      | TARGETED-PERF(_60) | primitive_filter_string_selective                      | parquet / none / none | 1.94   | 1.92        |   +1.36%   |   1.35%    |   1.24%        | 1     
            | 5     |
      | TARGETED-PERF(_60) | primitive_shuffle_join_union_all_with_groupby          | parquet / none / none | 49.09  | 48.47       |   +1.29%   |   0.11%    |   0.30%        | 1     
            | 5     |
      | TARGETED-PERF(_60) | primitive_filter_decimal_non_selective                 | parquet / none / none | 1.70   | 1.68        |   +1.28%   |   2.00%    |   1.80%        | 1     
            | 5     |
      | TARGETED-PERF(_60) | primitive_filter_string_like                           | parquet / none / none | 14.83  | 14.66       |   +1.17%   |   0.01%    |   0.14%        | 1           | 5     |
      
      

      I'm not considering reducing the aggregation fanout because there's quite a strong "pre-partitioning" effect between the pre-aggregation and merge aggregation - reducing the fanout led to big regressions on some large aggregations in my experiment:

      Report Generated on 2017-04-07
      Run Description: "Base: c24142491b3e140aac8c6818668472707d015cc7 vs Ref: e1395f670f30df823adf682101b00fc72e15bd71"
      
      Cluster Name: UNKNOWN
      Lab Run Info: UNKNOWN
      Impala Version:          impalad version 2.9.0-SNAPSHOT RELEASE ()
      Baseline Impala Version: impalad version 2.9.0-SNAPSHOT RELEASE ()
      
      +-----------+-----------------------+---------+------------+------------+----------------+
      | Workload  | File Format           | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) |
      +-----------+-----------------------+---------+------------+------------+----------------+
      | TPCH(_60) | parquet / none / none | 18.84   | +3.78%     | 12.28      | -0.19%         |
      +-----------+-----------------------+---------+------------+------------+----------------+
      
      +-----------+----------+-----------------------+--------+-------------+------------+------------+----------------+-------------+-------+
      | Workload  | Query    | File Format           | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%)  | Base StdDev(%) | Num Clients | Iters |
      +-----------+----------+-----------------------+--------+-------------+------------+------------+----------------+-------------+-------+
      | TPCH(_60) | TPCH-Q18 | parquet / none / none | 86.46  | 56.09       | R +54.15%  | * 12.70% * |   0.60%        | 1           | 5     |
      | TPCH(_60) | TPCH-Q17 | parquet / none / none | 17.71  | 16.32       |   +8.55%   |   0.77%    |   1.05%        | 1           | 5     |
      | TPCH(_60) | TPCH-Q2  | parquet / none / none | 3.21   | 3.13        |   +2.73%   |   5.51%    |   3.46%        | 1           | 5     |
      | TPCH(_60) | TPCH-Q10 | parquet / none / none | 13.63  | 13.31       |   +2.41%   |   1.04%    |   0.97%        | 1           | 5     |
      | TPCH(_60) | TPCH-Q15 | parquet / none / none | 10.99  | 10.86       |   +1.12%   |   0.29%    |   0.49%        | 1           | 5     |
      | TPCH(_60) | TPCH-Q11 | parquet / none / none | 2.43   | 2.41        |   +0.64%   |   2.18%    |   3.88%        | 1           | 5     |
      | TPCH(_60) | TPCH-Q14 | parquet / none / none | 8.07   | 8.05        |   +0.15%   |   1.59%    |   1.00%        | 1           | 5     |
      | TPCH(_60) | TPCH-Q13 | parquet / none / none | 21.97  | 21.94       |   +0.13%   |   0.30%    |   0.19%        | 1           | 5     |
      | TPCH(_60) | TPCH-Q20 | parquet / none / none | 7.80   | 7.80        |   +0.03%   |   0.42%    |   0.63%        | 1           | 5     |
      | TPCH(_60) | TPCH-Q8  | parquet / none / none | 12.16  | 12.17       |   -0.09%   |   1.14%    |   0.54%        | 1           | 5     |
      | TPCH(_60) | TPCH-Q5  | parquet / none / none | 9.64   | 9.68        |   -0.44%   |   0.54%    |   0.48%        | 1           | 5     |
      | TPCH(_60) | TPCH-Q22 | parquet / none / none | 6.52   | 6.56        |   -0.66%   |   0.94%    |   0.77%        | 1           | 5     |
      | TPCH(_60) | TPCH-Q12 | parquet / none / none | 9.30   | 9.37        |   -0.81%   |   0.19%    |   1.81%        | 1           | 5     |
      | TPCH(_60) | TPCH-Q6  | parquet / none / none | 4.56   | 4.60        |   -0.84%   |   0.60%    |   0.81%        | 1           | 5     |
      | TPCH(_60) | TPCH-Q1  | parquet / none / none | 27.55  | 27.84       |   -1.03%   |   1.32%    |   0.92%        | 1           | 5     |
      | TPCH(_60) | TPCH-Q4  | parquet / none / none | 7.79   | 7.88        |   -1.12%   |   1.10%    |   0.27%        | 1           | 5     |
      | TPCH(_60) | TPCH-Q21 | parquet / none / none | 67.65  | 69.84       |   -3.14%   |   0.25%    |   0.13%        | 1           | 5     |
      | TPCH(_60) | TPCH-Q16 | parquet / none / none | 5.45   | 5.67        |   -3.73%   |   0.96%    |   1.46%        | 1           | 5     |
      | TPCH(_60) | TPCH-Q9  | parquet / none / none | 33.72  | 36.25       |   -6.96%   |   0.54%    |   0.27%        | 1           | 5     |
      | TPCH(_60) | TPCH-Q19 | parquet / none / none | 10.22  | 11.06       |   -7.60%   |   0.23%    |   0.63%        | 1           | 5     |
      | TPCH(_60) | TPCH-Q3  | parquet / none / none | 11.86  | 13.32       |   -10.96%  |   0.62%    |   1.00%        | 1           | 5     |
      | TPCH(_60) | TPCH-Q7  | parquet / none / none | 35.72  | 45.17       | I -20.93%  |   0.47%    |   6.56%        | 1           | 5     |
      +-----------+----------+-----------------------+--------+-------------+------------+------------+----------------+-------------+-------+
      
      (R) Regression: TPCH(_60) TPCH-Q18 [parquet / none / none] (56.09s -> 86.46s [+54.15%])
      +--------------+------------+--------+----------+------------+------------+--------+----------+------------+--------+--------+-----------+
      | Operator     | % of Query | Avg    | Base Avg | Delta(Avg) | StdDev(%)  | Max    | Base Max | Delta(Max) | #Hosts | #Rows  | Est #Rows |
      +--------------+------------+--------+----------+------------+------------+--------+----------+------------+--------+--------+-----------+
      | 13:AGGREGATE | 56.02%     | 61.84s | 27.05s   | +128.61%   |   9.36%    | 71.87s | 27.37s   | +162.55%   | 1      | 3.85K  | 8.60M     |
      | 04:AGGREGATE | 15.87%     | 17.52s | 23.23s   | -24.59%    | * 30.12% * | 26.50s | 23.43s   | +13.09%    | 1      | 90.00M | 86.02M    |
      | 05:HASH JOIN | 18.97%     | 20.95s | 23.27s   | -9.99%     | * 14.01% * | 26.01s | 23.56s   | +10.37%    | 1      | 92.82K | 360.01M   |
      | 02:SCAN HDFS | 4.13%      | 4.56s  | 4.62s    | -1.31%     |   1.14%    | 4.64s  | 4.71s    | -1.42%     | 1      | 92.82K | 360.01M   |
      +--------------+------------+--------+----------+------------+------------+--------+----------+------------+--------+--------+-----------+
      
      (I) Improvement: TPCH(_60) TPCH-Q7 [parquet / none / none] (45.17s -> 35.72s [-20.93%])
      +--------------+------------+----------+----------+------------+-----------+----------+----------+------------+--------+---------+-----------+
      | Operator     | % of Query | Avg      | Base Avg | Delta(Avg) | StdDev(%) | Max      | Base Max | Delta(Max) | #Hosts | #Rows   | Est #Rows |
      +--------------+------------+----------+----------+------------+-----------+----------+----------+------------+--------+---------+-----------+
      | 11:AGGREGATE | 4.63%      | 1.75s    | 1.80s    | -2.69%     |   0.30%   | 1.76s    | 1.86s    | -5.38%     | 1      | 4       | 1.64M     |
      | 10:HASH JOIN | 10.12%     | 3.83s    | 4.33s    | -11.56%    |   0.50%   | 3.86s    | 4.50s    | -14.20%    | 1      | 350.34K | 36.00M    |
      | 09:HASH JOIN | 5.43%      | 2.06s    | 2.52s    | -18.44%    |   1.58%   | 2.11s    | 2.66s    | -20.51%    | 1      | 109.37M | 36.00M    |
      | 08:HASH JOIN | 9.95%      | 3.77s    | 5.09s    | -25.84%    |   0.31%   | 3.79s    | 5.34s    | -29.10%    | 1      | 109.37M | 36.00M    |
      | 03:SCAN HDFS | 2.47%      | 935.77ms | 947.71ms | -1.26%     |   1.11%   | 945.37ms | 969.54ms | -2.49%     | 1      | 9.00M   | 9.00M     |
      | 07:HASH JOIN | 34.60%     | 13.11s   | 19.43s   | -32.54%    |   0.51%   | 13.21s   | 21.49s   | -38.55%    | 1      | 109.37M | 36.00M    |
      | 14:EXCHANGE  | 2.50%      | 947.18ms | 931.62ms | +1.67%     |   1.96%   | 978.10ms | 948.27ms | +3.14%     | 1      | 90.00M  | 90.00M    |
      | 02:SCAN HDFS | 3.14%      | 1.19s    | 1.20s    | -0.97%     |   0.69%   | 1.20s    | 1.22s    | -1.35%     | 1      | 90.00M  | 90.00M    |
      | 06:HASH JOIN | 23.24%     | 8.81s    | 9.67s    | -8.92%     |   0.52%   | 8.87s    | 10.16s   | -12.71%    | 1      | 109.37M | 36.00M    |
      | 00:SCAN HDFS | 2.32%      | 879.94ms | 890.98ms | -1.24%     |   1.29%   | 891.75ms | 892.31ms | -0.06%     | 1      | 600.00K | 600.00K   |
      +--------------+------------+----------+----------+------------+-----------+----------+----------+------------+--------+---------+-----------+
      
      (V) Significant Variability: TPCH(_60) TPCH-Q18 [parquet / none / none] (0.60% -> 12.70%)
      +--------------+------------+-----------+----------------+------------------+--------+--------+-----------+
      | Operator     | % of Query | StdDev(%) | Base StdDev(%) | Delta(StdDev(%)) | #Hosts | #Rows  | Est #Rows |
      +--------------+------------+-----------+----------------+------------------+--------+--------+-----------+
      | 04:AGGREGATE | 15.87%     | 30.12%    | 0.65%          | +4526.94%        | 1      | 90.00M | 86.02M    |
      | 05:HASH JOIN | 18.97%     | 14.01%    | 0.98%          | +1323.57%        | 1      | 92.82K | 360.01M   |
      +--------------+------------+-----------+----------------+------------------+--------+--------+-----------+
      
      

        Attachments

          Activity

            People

            • Assignee:
              tarmstrong Tim Armstrong
              Reporter:
              tarmstrong Tim Armstrong
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: