Affects Version/s: Impala 2.9.0
Fix Version/s: None
I would like to propose adding some alternative controls to the current mem_limit in admission control.  a minimum memory that each query should get for startup  a maximum memory a query may take before spilling  a maximum spill size so that one query can not take up all spill space.
Problem statement: For ad-hoc and exploratory workloads memory use is rarely known in advance and failures due to out of memory present a poor user experience as the average user has to try and retry with increasing memory limits, then when they issue their next small query it's likely their increased memory limit will reduce the concurrency for others in the pool (e.g. if they set the mem_limit to 20GB for their last query, but need only 3GB for their next query).
As an example, lets say you have a pool with 60GB of RAM and the average query uses less than 3GB with some queries using 20GB.
So for the example workload above in the current state you either choose 3GB and have a concurrency of 20 queries, or set it to 20GB to keep large queries from failing, and have only 3 queries for the 60GB pool. Any user manually increasing their limit to 20GB would greatly reduce the other number of queries that can be run (1user*20GB+13users*3GB=>14 queries at once).
With additional settings: If you have the 3 settings mentioned above, you could set the minimum memory to 1GB, (assuming this is enough for spilling to occur), this would prevent a query from failing to startup/spill when usage is at 59.5/60GB. This means you could theoretically have 60 queries instead of 20 at a time (in the previous example with mem_limit) if they were all relatively small queries. Then to avoid single 'big' queries from taking up all the memory you would set a max mem at which point spilling would occur, very similar to the current mem_limit. In our case that could be 20GB and all queries within that limit would finish quickly without spilling, and the one 'bad' query would only reduce the total concurrency of impala to 41 (1user*20GB+40users*1GB). Finally to avoid that 'bad' query from using up all the spill space a limit would be set for the max amount of spill.
I'm sure this is not the only solution, nor the best one, but wanted to share the idea as I believe it would greatly improve the user experience.