@patrick - We're using these settings, which I believe are based on what's recommended in the troubleshooting guide.
Looking at the logs I do see lots of GC activity. For example:
Total time for which application threads were stopped: 0.5599050 seconds
Application time: 0.0056590 seconds
I only see this on the hosts that became unresponsive after acquiring lots of connections.
Any suggestions for the GC flags? If there's something better I can experiment, and update the wiki if we discover something interesting.