Resolution: Not A Bug
Affects Version/s: 1.8.9
Fix Version/s: None
We found that oak-run datastorecheck falsely reports missing blobs when running datastorecheck without the --verbose option.
Even the online datastore consistency check falsely reports the same missing blobs.
This is related due to the fact that the standard blob reference collector in oak-run datastorecheck looks at all compaction generations in the segment store instead of only the last one.
After running an offline compaction, and thus keeping only 1 generation, the correct number of blob references and missing blobs is reported by oak-run datastorecheck.
The bug on the 1.8 branch comes from org.apache.jackrabbit.oak.plugins.blob.BlobReferenceRetriever#collectReferences (line 429) and by following that you arrive at org.apache.jackrabbit.oak.segment.file.FileStore#tarFiles (line 1013) stating:
newOldReclaimer(lastCompactionType, getGcGeneration(), gcOptions.getRetainedGenerations()));
I'm not familiar enough with this source code, so I won't attempt adding a patch.
I did double-check trunk and saw the same line of code there: org.apache.jackrabbit.oak.segment.file.GarbageCollector#collectBlobReferences (line 324).
I attached a text file with the outputs of the commands I ran.
We currently use Oak 1.8.9 using AEM 184.108.40.206 and oak-blob-cloud 1.8.9 from the 1.8.3 AEM S3 connector.