Details
-
Sub-task
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
ghx-label-4
Description
As of IMPALA-8780, the BufferedPlanRootSink returns an error whenever a client sets the fetch size to a value lower than the BATCH_SIZE. The issue is that when reading from a RowBatch from the queue, the batch might contain more rows than the number requested by the client. So the BufferedPlanRootSink needs to be able to partially read a RowBatch and remember the index of the rows it read. Furthermore, num_results in BufferedPlanRootSink::GetNext could be lower than BATCH_SIZE if the query results cache in ClientRequestState has a cache hit (only happens if the client cursor is reset).
Another issue is that the BufferedPlanRootSink can only read up to a single RowBatch at a time. So if a fetch size larger than BATCH_SIZE is specified, only BATCH_SIZE rows will be written to the given QueryResultSet. This is consistent with the legacy behavior of PlanRootSink (now BlockingPlanRootSink), but is not ideal because that means clients can only read BATCH_SIZE rows at a time. A higher fetch size would potentially reduce the number of round-trips necessary between the client and the coordinator, which could improve fetch performance (but only if the BlockingPlanRootSink is capable of filling all the requested rows).
Attachments
Issue Links
- causes
-
IMPALA-8939 TestResultSpooling.test_full_queue_large_fetch is flaky
- Resolved
- is related to
-
IMPALA-7312 Non-blocking mode for Fetch() RPC
- Resolved
-
IMPALA-1618 Impala server should always try to fulfill requested fetch size
- Resolved