Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
Public beta
-
None
Description
Currently, the server side will base the number of rows per batch to achieve a few MB per response, or a time budget, whichever comes first. But, when scanning an empty projection (for a COUNT query in Impala for example), the size budget is never achieved, so we can end up with response batches with upwards of 80M rows. This is fine on the RPC layer, but when the client tries to expand these into a vector<KuduRowResult>, we end up taking sizeof(KuduRowResult)*80M = 1.3GB of RAM.
When running COUNT queries on Impala against tables with lots of tablets, this quickly pushes the impalad up to 60GB+ RAM usage given it starts a number of scanners in parallel.