Maybe we could even go further and add an identifier of the Sorter which has been used to sort the segment
+1. This makes sense. We need to be as robust as possible. If a user makes a mistake, it's best if he can avoid tripping himself. It needs to be something unique, i.e. not just the sorter class, but e.g. for NumericDV also the field. Perhaps Sorter should have a sortKey? Then we record Sorter.class_Sorter.sortKey?
I agree that addIndexes should use MergePolicy. Unlike the Directory version, which shallow-copies the segments, including whatever Diagnostics information they contain, the IR version uses SegmentMerger, however bypasses MP. So e.g. if the app uses TieredMP, limiting the merged segment size to 10 GB, you can addIndexes a 20-segment index, totalling 100 GB, and end up in a single 100 GB segment. That's ... uexpected.
So I think we need something on MP, maybe findMergesForAddIndexes... and then it will be easier to control how these indexes are added. If that's the direction, perhaps we do this in a different issue, as it's unrelated to sorting?
And, while diagnostics allow us to record sorted + sorter, we're still limited to SegmentReader. In practice this may not be a true limitation, but I feel that if AtomicReader exposed metadata(), like commitData() for the composite, it will give us more freedom. This collector does not need to be limited to SegmentReader only ... but I guess it's ok for now, at least, I know others don't like the idea of having metadata() on AR.