+1 for opening a new jira discussion r-o mode when IOE flushing journal/ledgers.
b) Read the entries and identify missing entries if any?
Yeah, the DistributionScheduling is happening in the client side and batch reading is also good.
I am thinking that the ledgers are local to the server and how about read them directly instead of using PerChannelBookieClient?.
oh, seems that I don't explain clearly at my previous comment. As my thought, bookie server would just find the corrupted/missing entries that it should own, then schedule a re-replication procedure itself to read the corrupted/missing entries from its brother bookie servers (in same quorum). so the read is a remote read from other server.
in this way, we don't even to change the metdata in zookeeper.
as the example you explain,
Say, entries 0-100 ledger metadata mapping is
0 (A, B, C)
50(B, C, D)
B runs a scanner itself, it found that 30-39 is corrupted/missing. it schedule a re-replication on (30-39), the re-replication would be a remote read (30-39) from C or D. we don't need to change ledger metdata, changing ledger metdata will introduce distribute consensus issue (you can refer discussion in
another tough thing is we need to tell closed ledger from opened/in-recovery ledger, when handling last ensemble of opened/in-recovery ledger.
I am missing something, Could you give more details on this?
for a closed ledger, we know the entry range of an ensemble. but for an opened/in-recovery ledger, we have no idea about the end entry of last ensemble.