Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
1.15.0
-
None
Description
As of 1.15.0 version, kudu-tserver and kudu-master both don't take into account current memory usage when admitting requests into the RPC queue. The only limit that is checked by ServicePool::QueueInboundCall() is the current size of the RPC service queue size, which is controlled by the --rpc_service_queue_length flag.
Given that the size of an incoming request might go as high as --rpc_max_message_size (50MiB by default) and --\rpc_service_queue_length might be set high to accommodate for a surge of incoming requests, Kudu servers might go beyond the hard memory limit controlled by the --memory_limit_hard_bytes flag. Also, the Raft prepare queue doesn't seem to expose a limit on the total size of requests accumulated in the queue. If too much memory is consumed by a Kudu server, it might exit unexpectedly either because it is killed by OOM killer or the new operator throws std::bad_alloc and the C++ runtime terminates the process with SIGABRT since memory allocation failures are not handled in the Kudu code.
At least, we saw an evidence of such situation when disk IO was very slow and kudu-tserver has accumulated many requests in its prepare queue (probably, there was some particular workload pattern which first sent many small write requests first and then followed up with big ones).