Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Later
-
Impala 2.10.0, Impala 2.11.0
-
None
-
ghx-label-7
Description
For each query block, Impala uses the estimated materialized size of tables as a heuristic for choosing the left-most table in a series of joins. Unfortunately, that heuristic does not factor in the number of hosts that tables will execute on. The number of hosts of the left-most table dictates the degree of inter-node parallelism.
To handle this tradeoff between parallelism and size we should use a heuristic like the one in IMPALA-5612.