While investigating hbase performance, I found a bottleneck caused by
Nagles algorithm. For some reads I would get a bi-modal distribution
of read times, with about half the times being around 20ms, and half
around 200ms. I tracked this down to the well-known interaction between
Nagle's algorithm and TCP delayed acknowledgments.
I found that calling setTcpNoDelay(true) on the server's socket
connection dropped all of my read times back to a constant 20 ms.
I propose a patch to have this TCP_NODELAY option be configurable. The
attacked patch allows one to set the TCP_NODELAY option on both the
client and the server side. Currently this is defaulted to false
(i.e., with Nagle's enabled).
To see the effect, I have included a Test which provokes the issue by
sending a MapWriteable over an IPC call. On my machine this test shows
a speedup of 117 times when using TCP_NODELAY.
These tests were done on OSX 10.4. Your milage may very with other
TCP/IP implementation stacks.