While running a blob store consistency check on an AEM 6.3 instance, I've noticed a large amount of blobs reported missing. When I checked the list of the missing blobs, I couldn't find any that was actually missing. After investigating the Oak sources, I could narrow the issue down to the code that's sorting the marked blob references. It's supposed to remove duplicates while sorting, but it doesn't remove all of them in some circumstances. The duplicates then show up as bogus missing blobs since they aren't matched by an equal number of duplicate lines in the "available blobs" list.
As far as I can tell, the bug only manifests on installations that use DocumentNodeStore (causing the marked blob IDs to also contain the referencing node) and only if the number of blobs in the blob store reaches a certain threshold (causing the sort code to sort in chunks and merge, instead of sorting everything in memory at once). This means it's not easy to reproduce in a development environment where you only have dummy content.
I'll attach to this ticket a proposed patch that contains a fix and a test case that verifies the correct merge logic. Please let me know if you also need reproduction steps to work on this, but I'd rather not do it because the only place I can reproduce this has a blobstore over 1TB in size.