A scenario we've seen play out a couple of times is this.
1. An Impala admin sets up memory-based admission control but sets the default query mem_limit to 0. This means that admission control will use memory estimates instead of mem_limit. Typically admins want some protection from large queries consuming excessive memory but can't or don't want to set a single mem_limit because the workload is unknown or unpredictable. This configuration has caveats and we recommend against it, but this happens and often works well enough as long as the workload is comprised of relatively simple queries. The caveats include:
- There is no enforcement that a query stays within the memory estimate. This means that a query can fail or force other queries to fail.
- Memory estimates are often inaccurate (this is unavoidable since they depend on cardinality estimates, which are commonly off by 10x even with state-of-the-art query planners). This means that runnable queries may be impossible to admit without setting a mem_limit.
2. Something changes about the workload, e.g. a new query is added, stats are computed, data size changes, we make an otherwise-innocuous planner change. Then we run into the second problem above and one or more queries cannot be executed, e.g. "Rejected query from pool root.foo: request memory needed 1234.56 GB is greater than pool max mem resources 1000.00 GB. Use the MEM_LIMIT query option to indicate how much memory is required per node. The total memory needed is the per-node MEM_LIMIT times the number of nodes executing the query. See the Admission Control documentation for more information. "
The last problem is the problem this JIRA is intending to work around. The preferred solutions, which work most of the time, are:
- Configure a default query memory limit for the pool
- Set a mem_limit for the query
- Disable memory-based admission control and do admission control based on num_queries, which is simpler and easier to understand.
It would however, be useful to have a third fallback in cases where the first two options are difficult or impossible to apply. The basic requirement is to allow an administrator to configure a resource pool such that queries with very high estimates can run. We probably need enough flexibility that they can run concurrently with other smaller queries (i.e. the big query shouldn't take over the whole pool) and ideally we would also have a mem_limit applied to the query so that we're still protected from runaway memory consumption.
The long-term solution to this is IMPALA-6460. This is a short-term workaround that can tide users over until we have a comprehensive solution.