IMHO the local time processing the fetch response is linear to # partitions in the request, while the network time writing the socket buffer is not, depending on whether the data is still in file cache or not. Hence following the 1) reset-socket-buffer-size or 2) subset-topic-partitions-at-a-time methods if we need either 1) set the buffer size too small which is unfair for other requests that do not hit I/O and may result in unnecessary round trips or 2) fetch too small a subset of topic-partitions which will be the same case as 1).
Capping based on time is better since it provides "fairness" but that seems a little hacky.
My reasoning of decoupling socket and network processor is the following. As we scale up the principle should be "various clients are isolated from each other". As for fetch request it would be "if you request old data from many topic partitions only your self-request should take long time but other requests should not be impacted". Today a request's life time as on server is
socket -> network processor -> request handler -> (possible) disk I/O due to flush for produce request -> socket processor -> network I/O
and one way to enable isolation is that no pair of this path is single-threaded. Today socket -> network processor is via acceptor, network processor -> request handler is via request queue, request handler -> (possible) disk I/O due to flush for produce request is fixed in
KAFKA-615; but socket processor -> network I/O is still coupled, and fixes to issues resulted by this coupling would be taking care of the "worst case", which does not obey the "isolation" principle.
I agree this is rather complex and would be a long term thing.