The network throughput computation fails to take into account of the fact that multiple RPCs can be happening in parallel. Currently, the throughput is computed by (total bytes sent / total network time). The total network time is the aggregate of the network time observed of each RPC. This seems hard to understand (or wrong?) when there are drastically different throughput when sending to different hosts. It may be slightly easier to understand if we switch to measuring the observed network throughput of each individual RPC and use a summary counter or a histogram to record the throughput.