Currently if a client is throttled duet to quota violation, the broker will only send back a response to the clients after the throttle time has passed. In this case, the clients don't know how long the response will be throttled and might hit request timeout before the response is returned. As a result the clients will retry sending a request and results a even longer throttle time.
The above scenario could happen when a large clients group sending records to the brokers. We saw this when a MapReduce job pushes data to the Kafka cluster.
To improve this, the broker can return the response with throttle time immediately after processing the requests. After that, the broker will mute the channel for this client. A correct client implementation should back off for that long before sending the next request. If the client ignored the throttle time and send the next request immediately, the channel will be muted and the request won't be processed until the throttle time has passed.
A KIP will follow with more details.