There are 2 main issues.
1. The broker first decompresses and then recompresses each message (to assign new offsets) before validating the message size (we have to do the validation after recompression since the message size could change). So, it can spend many secs to decompress/recompress an oversized message, only to be dropped later. While this was happening, a request thread was tied up, which reduced the capacity on the broker.
2. Both the fetch and producer requests need to hold a leader lock (per partition). So, if the producer is slow in appending the log, it will block other producer/fetch requests on the same partition.