Details
-
Improvement
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
Impala 2.12.0, Impala 3.2.0
-
None
-
ghx-label-1
Description
Impala's current interaction with clients is pulled-based: it relies on clients to fetch results to trigger the generation of more result row batches until all the result rows have been produced. If a client issues a query without fetching all the results, the query fragments will continue to consume the resources until the query hits is cancelled and unregistered for whatever reasons. This is undesirable as resources are held up by misbehaving clients and other queries may wait for extended period of time in admission control due to this.
The high level idea for this JIRA is for Impala to have a mode in which result sets of queries are eagerly fetched and spooled somewhere (preferably some persistent storage). In this way, the cluster's resources are freed up once all result rows have been fetched and stored in the spooling location. Incoming client fetches can be returned from this spooled locations.
cc'ing stakiar, twm378, joemcdonnell, lv
Attachments
Issue Links
- is related to
-
IMPALA-9210 Query timeline should include entry when all query results are spooled
- Open
-
IMPALA-9339 Revise explanation of RowMaterializationTimer
- Open
-
IMPALA-4268 Rework coordinator buffering to buffer more data
- Resolved
-
IMPALA-9818 Add fetch size as option to impala shell
- Resolved
-
IMPALA-8925 Consider replacing ClientRequestState ResultCache with result spooling
- Resolved
-
IMPALA-9856 Enable result spooling by default
- Resolved
- relates to
-
IMPALA-10180 Add average size of fetch requests in runtime profile
- Resolved