Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
Impala 2.2, Impala 2.3.0, Impala 2.5.0
Description
We have observed a number of problems with the way Impala dynamically creates scanner threads, where more scanner threads are created than is ideal.
- The scanner memory heuristic can lead to excessive memory consumption, especially for very selective scans with wide rows. The current heuristic for limiting memory consumption does not do well in these cases. There are likely several interlinked causes here, which will need further investigation.
- The non-deterministic scanner thread heuristic can lead to a great deal of performance variability. At a minimum, the number of scanner threads should always converge to the same number for the same plan and data if the query is the only one running on the cluster.
- Beyond a point, adding additional scanner threads does not improve performance (and can degrade it), but the heuristic will keep on spinning up scanner threads if there are tokens and memory available.
Attachments
Issue Links
- is related to
-
IMPALA-2834 TestNestedTypes.test_tpch fails with memory limit exceeded
- Resolved
-
IMPALA-3209 Optimize scanner memory usage and make scan nodes adhere to a memory constraint
- Resolved