Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Won't Fix
-
Impala 2.3.0
-
None
-
None
Description
I noticed this issue with the Kudu scan node implementation, but I imagine it could happen with HDFS as well:
- on a big box, we started up 48 scanner threads for a 'SELECT COUNT' query
- the underlying batches that are read from Kudu return a few million rows each (because it's trying to do large IOs to amortize round-trips, and the projection is empty for COUNT)
- the scannerthread chops these Kudu batches into RowBatches of 1000 rows each and pushes those onto the RowBatchQueue
Because each backend IO (scan RPC to Kudu) results in thousands of Impala RowBatches, we end up with the main thread pulling "round robin" from all of the scanner threads, rather than exhausting one Kudu batch before moving to the next. The issue here is that we see the following:
- when the query starts, 48 threads hammer the kudu server with Scan RPCs
- the Kudu server is then completely quiet for ~30 seconds while they drain their buffers
- all of the buffers "empty" at basically the same time, and we get another herd of IO on the Kudu side.
It would be preferable to make the RowBatchQueue "unfair" in some way, such that the main thread exhausts entire IO buffers at a time, rather than pulling little bits from each of the threads in a round-robin fashion.