Steps to Reproduce:
1. Create a non-compacted topic with 1 partition
2. Set a produce quota of 512 KB/s
3. Send messages at 20 MB/s
4. Observe heap memory growth as time progresses
While running performance tests with a user configured with a produce quota, we found that the lead broker serving the requests would exhaust heap memory if the producer sustained a inbound request throughput greater than the produce quota.
Upon further investigation, we took a heap dump from that broker process and discovered the ThrottledResponse object has a indirect reference to the byte holding the messages associated with the ProduceRequest.
We're happy contributing a patch but in the meantime wanted to first raise the issue and get feedback from the community.