Details
-
Improvement
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
Impala 2.12.0
-
None
-
ghx-label-3
Description
During scale testing, it was noticed that tuning the size of the accumulated row batches in data stream sender will affect the performance of Impala. This is understandable as a larger row batch will amortize the cost of compression and RPC in general. The default value is 16KB per channel. Experiment in a 38 node cluster with 48 concurrent users running 10TB TPC-DS shows about 5% improvement in query-per-hour when bumping the default value to 512KB. This is a tradeoff between memory consumption and performance. Having this flag allows us to tune for performance more easily.
if (FLAGS_use_krpc) { *sink = pool->Add(new KrpcDataStreamSender(fragment_instance_ctx.sender_id, row_desc, thrift_sink.stream_sink, fragment_ctx.destinations, 16 * 1024, state)); } else { // TODO: figure out good buffer size based on size of output row *sink = pool->Add(new DataStreamSender(fragment_instance_ctx.sender_id, row_desc, thrift_sink.stream_sink, fragment_ctx.destinations, 16 * 1024, state)); }