We've been adapting the consensus logs for a while and I think we can finally get to the bottom of this issue. I'm attaching the logs from the 3 nodes that participated in the same config for tablet eaa1877a2b3540cf8202aff844c6ca79.
ITBLL is driving the load and eventually fails at 2016-02-15 14:53:12,005 trying to write to node-2 AKA a1081edd2ca24f6b9dcdd7e5000f95ec. The peer that gets stuck is node-5 AKA cdec7fdacbac4ad1b095275b3bdbbe5c, starting from this line:
The chaos monkey running on this setup is dropping packets one node at time.
I'll attach the logs in a moment.