Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
ghx-label-6
Description
Currently scratch_limit provides a per-query limit on scratch space usage. A global or per-pool default scratch_limit can be set to provide some degree of control over how much scratch space different queries can use.
However there are limitations here:
- We cannot directly control the aggregate consumption of all queries or all queries in a pool. E.g. limit "ad-hoc" queries to 50GB total scratch space or limit all Impala queries to 200GB total scratch space.
- We cannot control how much space Impala will use on each disk. There may be some reason to have different limits on different disks, E.g. 10GB on /data1 and 20GB on /data2. Or we may simply want to avoid skew if one disk goes offline. E.g. if you have a 20GB aggregate limit, and one disk fails, then the effective limit for the others disks increases.
The design will require some significant thought to get the knobs right. E.g. one option is to allow global limits per-disk and per-pool limits for aggregate use across all disks.