We're getting about 2500 gets/sec max per regionserver when it serves data from the block cache using HBase 0.20.3. All data is in the same region and the region has just one store file for the family we're querying. HBASE-2180 didn't seem to help here. At this point the region server is using about 1.5 cores. Since we have 8 cores available the bottleneck does not seem to be CPU.
The number of clients does not seem to matter (we tested with 1-4 clients).
The only other bottleneck that I can think of is network, but we test on a local 1 Gbit LAN. Our web server can handle 15,000 requests/sec, so that doesn't seem to be the case.
Maybe there is some synchronization going limiting the CPU scalability of a single region server?