I've been discussing this fail I found with Yonik.
The problem seems to be that a replica tries to recover and publishes recovering - the attempt then fails, but docs are now coming in from the leader. The replica tries to recover again and has gotten enough docs to pass peery sync.
I'm trying a possible solution now where we won't allow peer sync after a recovery that is not successful.
- relates to
SOLR-8094 HdfsUpdateLog should not replay buffered documents as a replacement to dropping them.