We have had several performance concerns or potential improvements for the logging subsystem. To conduct these in a data-driven way, it would be good to have a single-machine performance test that isolated the performance of the log.
The performance optimizations we would like to evaluate include
- Special casing appends in a follower which already have the correct offset to avoid decompression and recompression
- Memory mapping either all or some of the segment files to improve the performance of small appends and lookups
- Supporting multiple data directories and avoiding RAID
Having a standalone tool is nice to isolate the component and makes profiling more intelligible.
This test would drive load against Log/LogManager controlled by a set of command line options. These command line program could then be scripted up into a suite of tests that covered variations in message size, message set size, compression, number of partitions, etc.
Here is a proposed usage for the tool:
--partitions The number of partitions to write to
--dir The directory in which to write the log
--message-size The size of the messages
--set-size The number of messages per write
--compression Compression alg
--messages The number of messages to write
--readers The number of reader threads reading the data
The tool would capture latency and throughput for the append() and read() operations.