It turns out this is a type problem, but not the one I guessed above. The "existing document" values aren't Strings - they're Longs!
The code in question is the removeObj method here:
private void removeObj(@SuppressWarnings({"rawtypes"})Collection original, Object toRemove, String fieldName) {
if(isChildDoc(toRemove)) {
removeChildDoc(original, (SolrInputDocument) toRemove);
} else {
original.remove(getNativeFieldValue(fieldName, toRemove));
}
}
When removing an int value, getNativeFieldValue is consistent in always returning an integer. But the 'original' Collection of existing values has different types depending on whether it was retrieved from the update log (longs) or from the index (ints).
Java's Number classes declare equals() in such a way that a Long is never equal to an Integer, even when the two represent the same numeric quantity. So a type mismatch causes the remove attempt to fail when the existing doc is retrieved from the tlog.
This cause has one upside - while it's like to affect int/long and double/float, it seems specific to numerics and doesn't translate to other types.
There's a couple ways we could address this:
- Make sure the tlog's SolrInputDocument has int values instead of longs where appropriate.
- Special-case numeric values in AtomicUpdateDocumentMerger.removeObj and add custom removal logic that handles the inconsistency in our input types.
Conceptually I like (1) much better - it'd be nice if the atomic-update code (and anything else that pulls RTG values) didn't have to handle a bunch of arbitrary variations in its input. But this is could be a big change and might run afoul of whatever legitimate reasons there might be for storing the values as Longs in the UpdateLog.
Anyway, I'll probably go with the admittedly-hackier custom logic approach, despite not liking it. If anyone sees a better way forward though, please let me know.
It turns out this is a type problem, but not the one I guessed above. The "existing document" values aren't Strings - they're Longs!
The code in question is the removeObj method here:
When removing an int value, getNativeFieldValue is consistent in always returning an integer. But the 'original' Collection of existing values has different types depending on whether it was retrieved from the update log (longs) or from the index (ints).
Java's Number classes declare equals() in such a way that a Long is never equal to an Integer, even when the two represent the same numeric quantity. So a type mismatch causes the remove attempt to fail when the existing doc is retrieved from the tlog.
This cause has one upside - while it's like to affect int/long and double/float, it seems specific to numerics and doesn't translate to other types.
There's a couple ways we could address this:
Conceptually I like (1) much better - it'd be nice if the atomic-update code (and anything else that pulls RTG values) didn't have to handle a bunch of arbitrary variations in its input. But this is could be a big change and might run afoul of whatever legitimate reasons there might be for storing the values as Longs in the UpdateLog.
Anyway, I'll probably go with the admittedly-hackier custom logic approach, despite not liking it. If anyone sees a better way forward though, please let me know.