This came up in a HipChat discussion based on a consistency problem observed on a testing cluster. Todd suggested the following fix.
It is possible to lose read-your-writes consistency across a leader failure in the following scenario:
1. Write to leader, leader replicates successfully and commits locally, responds to the client, and crashes.
2. Client reads back the same data he just wrote, gets routed to the new leader who has not yet finished committing the entries in his log. This leader responds with stale data.
One solution to this problem is to have the leader stall responding to "up to date" reads until all of the entries in its log from previous terms have been committed.