There are only a few cases when this can happen and one can reproduce the exact bug behaviour:
It looks like in some cases, WeakReferences are already cleared before they appear in the ReferenceQueue (which is perfectly fine). In that case, the iterator will filter out those removed values, but the size() call later will still report them.
This is somehow a "bug" in the test (wrong assumption: the ReferenceQueue will contain the items before they are removed). But we can do better (and make the test work). We keep the backing map up-to-date by quickly removing the GCed values in the iterator.
I will attach a patch with some minor cleanups to make the code look like FilterIterator.