Here is something to think about wrt to leader election and replica failures -
If there are 3 replicas for a partition, and the leader acks the produce request once the request is acked by the 2 followers. The produce request doesn't care about the replication factor. So if one of the followers is slow, the leader will receive less than 2 acks from the followers, and it will go ahead and send a success ACK to the producer. The replicas update their HW only on the next replica fetch response. Since the HW committer thread is running independently. it is possible that the checkpointed HW of one of the 3 replicas is lower than the others.
If at this point, if leader fails, it will trigger the leader election procedure. According to the current design proposal, any replica in the ISR can become the leader. If the replica with the lower HW becomes the leader, then it will truncate its log upto this last checkpointed HW and start taking produce requests from there. The other 2 replicas, will send ReplicaFetchRequests with an offset that doesn't exist on the leader.
Effectively, it seems that we will end up losing some successfully acknowledged produce requests. Probably, the leader election procedure should check the HW of the participating replicas and give preference to replica with highest HW ?