Tests show that Kafka's million-level TPS is mainly owed to batch. When set batch size to 1, the TPS is reduced an order of magnitude. So I try to add this feature to RocketMQ.
For a minimal effort, it works as follows:
Only add synchronous send functions to MQProducer interface, just like send(final Collection msgs).
Use MessageBatch which extends Message and implements Iterable<Message>.
Use byte buffer instead of list of objects to avoid too much GC in Broker.
Split the decode and encode logic from lockForPutMessage to avoid too many race conditions.
On linux with 24 Core 48G Ram and SSD, using 50 threads to send 50Byte(body) message in batch size 50, we get about 150w TPS until the disk is full.
Although the messages can be accumulated in the Broker very quickly, it need time to dispatch to the consume queue, which is much slower than accepting messages. So the messages may not be able to be consumed immediately.
We may need to refactor the ReputMessageService to solve this problem.
And if guys have some ideas, please let me know or just share it in this issue.