The Kudu client may consume non-trivial amounts of memory and it is not accounted for in the query MemTracker, so it may be possible for the process to run out of memory.
In particular, we need to consider the following in the KuduTableSink:
- Buffer space for write ops to be sent, which is 100MB by default and is configurable via a flag.
- Per-row errors observed by the client (before they are fetched and deleted by Impala). Each error contains a string and a copy of the row. The client API indicates that the error handling could overflow, i.e. that it is bounded, but the implementation does not yet limit the errors so this could be unbounded.
We need to also understand whether there are any non-negligible memory allocations in the KuduScanNode.