Through code inspection, we found that the BatchWriter bins mutations inside of a synchronized block that covers calls to addMutation. Binning potentially involves lookups of tablet metadata and processes a fair amount of information. We will get better parallelism if we can either unlock the lock while binning, dedicate another thread to do the binning, or use one of the send threads to do the binning.
This has not been verified empirically yet, so there is not yet any profiling info to indicate the level of improvement that we should expect. Profiling and repeatable demonstration of this performance bottleneck should be the first step on this ticket.