We have identified multiple issues with BK compaction.
This issue is to list all of them in one Jira ticket.
MajorCompaction and MinorCompaction are very basic. Either they do it or won’t do it. Proposal is to add Low Water Mark(LWM) and High Water Mark(HWM) to the disk space. Have different compaction frequency and re-claim %s when the disk space is < low water mark , > LWM < HWM, > HWM.
MajorCompaction and Minor Compactions are strictly frequency based. They should at least time of the day based, and also run during low system load, and if the system load raises, reduce the compaction depending on the disk availability
Current code disables compaction when disk space grows beyond configured threshold. There is no exit from this point. Have an option to keep reserved space for compaction, at least 2 entryLog file sizes when isForceGCAllowWhenNoSpace enabled.
Current code toggles READONLY status of the bookie as soon as it falls below the disk storage threshold. Imagine if we keep 95% as the threshold, Bookie becomes RW as soon as it falls below 95 % and few more writes pushes it above 95 and it turns back to RONLY. Use a set of defines (another set of LWM/HWM?) where Bookie turns RO on high end and won't become RW until it hits low end.
Current code never checks if the compaction is enabled or disabled once the major/minor compaction is started. If the bookie goes > disk threshold (95%) and at that compaction is going on, it never checks until it finishes but there may not be disk available for compaction to take place. So check if compaction is enabled after processing every EntryLog.
Current code changes the Bookie Cookie value even when new storage is added. When the cookie changes Bookie becomes a new one, and BK cluster treats it as new bookie. If we have mechanism to keep valid cookie even after adding additional disk space, we may have a chance to bring the bookie back to healthy mode and have compaction going.
CheckPoint was never attempted to complete after once sync failure. There is a TODO in the code for this area.
When the disk is above threshold, Bookie goes to RO. If we have to restart the bookie, on the way back, bookie tries to create new entrylog and other files, which will fail because disk usage is above threshold, hence bookie refuses to come up.