The approach taken by this issue (well, originally by the 3x IndexSorter) is that you do an offline sorting of an entire index. So if you e.g. have a 10-segments index, you end up with a single segment, totally sorted across all documents.
At least from my understanding of how the online sorting would work (
LUCENE-4752), the Codec would need to determine beforehand the permutation on the documents, or build an in-memory segment and then when it's done, sort it and write it sorted, right? Otherwise, I don't understand how it can handle these series of addDocuments (assume the value denotes the location of the document in the sorted index): doc(2), doc(1), doc(7), doc(0)...? The stored fields and term-vectors are not cached in-memory today. The location of the document in the sorted index is unknown until all keys (by which you sort) are encountered, which may be too late for the Codec?
And even if you get passed that hurdle (say you're willing to cache everything in-memory and then flush to disk sorted), how will you handle merges? So now you have an index with segments 1,2,3 (each sorted). How do you merge-sort them? Today, you don't have the API for it, so let's say that we add it (plugging-in your own SegmentMerger). Now MP selects segments 1,2 for merge, so you end up with segments 3,4, which are again each sorted separately, but the index is not globally sorted, right? In a sorted index, the segments need to have a consistent > (or <) relationship between the segments .. or otherwise you're just traversing documents in random order.
In short, if you do come up with a reasonable way to do online index sorting (on
LUCENE-4752), I'll be all for it. And if it will make sense, we can even drop the offline index sorter too. But I think that there are many challenges in getting it right, and efficiently. It's not a mere Codec trick IMO.
Also, note that as far as memory consumption for offline sorting, we only cache in memory the current posting lists that's sorted (the rest relies on pre-existing random access API).
But, I could be totally missing your idea for online sorting, in which case I'd appreciate if you elaborate how you think it can be done. But I prefer that we discuss that on