Affects Version/s: None
Fix Version/s: None
Incase of multiple subsequent split and with an open handle on old reference file, it may result into split region which can never be cleaned
So Here are two issues.
- Region is getting split even when it has reference to its parent
- Region is going offline/in archive mode even when there are reference pending in store
- Region split (P)
- Before major compaction starts after split, open a handle on store file on new region (DA & DB)
- Let compaction completes on DA, (Here compaction will not clear reference store files as it is opened)
- Split new region (DA) again ( shouldSplit will return true as before compaction even does the cleanup, it removes the compacted files and reference in-memory list)
- Now CatalogJanitor will not remove this region as it has store references, majorCompaction/CompactedHFilesDischarger will not do the cleanup as it only looks at only online regions
- After above steps region-DA which is offline will always be in split regions and never getting cleaned up.
We found that catalog janitor is also not able to clean regions which are offline(split parent) because it has reference of the daughter of it's parent which is not getting cleaned up. This is causing lot of store files not getting cleaned causing more space in local index store and lot of split lingering regions.
Unit test repro the scenario has been attached.
Fix can be in CompactedHFilesDischarger or catalogJanitor to handle such cases. Even if such region exists which are offline and are split region. They should be able to clean t hem selves