Estimates how parallel batches count affects load time of same amount of persistent data. The lesser score (average time per load) the better. Results in short: looks like 1 batch per thread is enough. But we can keep x2 for network issues. servers - server nodes maxDsOps - streamer's batches per node sendMsgDelay - simulated network delay or write responses. ================================================================================================================================================================== Benchmark (cacheWriteMode) (maxDsOps) (sendMsgDelay) (servers) (walMode) Mode Cnt Score Error Units SERVERS: 2 JmhPersistentStreamerReceiverBenchmark.benchIndividual PRIMARY_SYNC 4 3 2 LOG_ONLY avgt 3 8,879 ± 21,115 s/op JmhPersistentStreamerReceiverBenchmark.benchIndividual PRIMARY_SYNC 8 3 2 LOG_ONLY avgt 3 9,516 ± 20,080 s/op >>> [Load time decrease stopped at `CPUs * 1`] JmhPersistentStreamerReceiverBenchmark.benchIndividual PRIMARY_SYNC 16 3 2 LOG_ONLY avgt 3 6,328 ± 11,400 s/op JmhPersistentStreamerReceiverBenchmark.benchIndividual PRIMARY_SYNC 32 3 2 LOG_ONLY avgt 3 8,272 ± 14,475 s/op JmhPersistentStreamerReceiverBenchmark.benchIndividual PRIMARY_SYNC 64 3 2 LOG_ONLY avgt 3 6,470 ± 10,581 s/op === FULL_SYNC: >>> [The same load time] JmhPersistentStreamerReceiverBenchmark.benchIndividual FULL_SYNC 4 3 2 LOG_ONLY avgt 3 11,965 ± 23,694 s/op JmhPersistentStreamerReceiverBenchmark.benchIndividual FULL_SYNC 8 3 2 LOG_ONLY avgt 3 10,712 ± 62,010 s/op JmhPersistentStreamerReceiverBenchmark.benchIndividual FULL_SYNC 16 3 2 LOG_ONLY avgt 3 9,671 ± 8,370 s/op JmhPersistentStreamerReceiverBenchmark.benchIndividual FULL_SYNC 32 3 2 LOG_ONLY avgt 3 11,302 ± 31,113 s/op JmhPersistentStreamerReceiverBenchmark.benchIndividual FULL_SYNC 64 3 2 LOG_ONLY avgt 3 10,407 ± 21,700 s/op SERVERS: 3 >>> [The same load time] JmhPersistentStreamerReceiverBenchmark.benchIndividual PRIMARY_SYNC 4 3 3 LOG_ONLY avgt 3 8,950 ± 13,192 s/op JmhPersistentStreamerReceiverBenchmark.benchIndividual PRIMARY_SYNC 8 3 3 LOG_ONLY avgt 3 7,891 ± 14,695 s/op JmhPersistentStreamerReceiverBenchmark.benchIndividual PRIMARY_SYNC 16 3 3 LOG_ONLY avgt 3 8,788 ± 1,277 s/op JmhPersistentStreamerReceiverBenchmark.benchIndividual PRIMARY_SYNC 32 3 3 LOG_ONLY avgt 3 9,041 ± 26,050 s/op JmhPersistentStreamerReceiverBenchmark.benchIndividual PRIMARY_SYNC 64 3 3 LOG_ONLY avgt 3 8,169 ± 6,059 s/op === FULL_SYNC: >>> [The same load except fluctuation with x64] JmhPersistentStreamerReceiverBenchmark.benchIndividual FULL_SYNC 4 3 3 LOG_ONLY avgt 3 14,333 ± 2,293 s/op JmhPersistentStreamerReceiverBenchmark.benchIndividual FULL_SYNC 8 3 3 LOG_ONLY avgt 3 15,487 ± 12,209 s/op JmhPersistentStreamerReceiverBenchmark.benchIndividual FULL_SYNC 16 3 3 LOG_ONLY avgt 3 15,182 ± 27,163 s/op JmhPersistentStreamerReceiverBenchmark.benchIndividual FULL_SYNC 32 3 3 LOG_ONLY avgt 3 14,643 ± 19,024 s/op JmhPersistentStreamerReceiverBenchmark.benchIndividual FULL_SYNC 64 3 3 LOG_ONLY avgt 3 22,955 ± 273,853 s/op ============================================================================================================================================================== OS: Linux void 5.18.0-4-amd64 #1 SMP PREEMPT_DYNAMIC Debian 5.18.16-1 (2022-08-10) x86_64 GNU/Linux # JMH 1.13 # VM version: JDK 11.0.2, VM 11.0.2+9 # VM options: -Xms2g -Xmx2g -server -XX:+AlwaysPreTouch CPU: vendor_id : AuthenticAMD cpu family : 25 model : 80 model name : AMD Ryzen 7 5800U with Radeon Graphics cpu MHz : 1600.000 cache size : 512 KB siblings : 16 cpu cores : 8 cpuid level : 16 Disk: `dd if=/dev/zero of=/tmp/test1.img bs=1G count=1 oflag=dsync`: 1073741824 bytes (1,1 GB, 1,0 GiB) copied, 1,25661 s, 854 MB/s `dd if=/dev/zero of=/tmp/test2.img bs=512 count=1000 oflag=dsync`: 512000 bytes (512 kB, 500 KiB) copied, 2,80265 s, 183 kB/s