Thanks Adrien Grand!
I folded in most of your feedback, except:
The only thing I am slightly worried about is how all optimized bulk mergers need to opt out if a sort order is configured. I am wondering if our base consumer classes should have two merge methods so that you would not have to check the sort order when overriding the method for regular merges? This is just an idea, it has drawbacks too since there would not be a single entry point to merging anymore and we would need another method in our API, but I'm suggesting it anyway hoping that it might give somebody a better idea.
I think it's OK to keep a single merge method? This merge method
already must deal with wild per-segment variabilities, e.g. different
fields across segments, some have deletions some don't, etc., so I
don't think we need to single out "has an index sort" into a separate
Also, implementing merge methods is really an uber-expert thing to
do, so such devs should be up to the task of handling an incoming
index sort, I think.
I think this is buggy since it ignores null sorts at the beginning of the list but not at the end,
Nice catch! I added test showing the bug, and then fixed it (pushed).
Let's remove it for now and later see whether this is something that could be added back?
OK I did that. I think at least there is a simple solution for doc-block
users: just index a doc values field with the "id" for each block, and
then sort on that.
but leveraging index sorting at search time looks like a big task to me so maybe we should defer it to a follow-up issue like sorting on flush?
I did move the early terminating to core, and I do think going forward
we should make it easier to use this ... it should somehow be the
default, and not a "make your own Collector" situation ...
As Rob has pointed out, even today (before promoting index sorting)
we could early-terminate in cases where the query is sorting on
index order, such as collecting first N hits for a filter.
But I agree we should do this separately. I will open follow-on issues
for "can we sort on flush too" and "searching should take advantage
of index sort by default".
Should DocIdMerger.Sub.nextDoc throw an IOException?
I tried this out, but it started to sprawl: the doc values all wrap
`DocIdMerger` under a java `Iterator` which cannot throw `IOException`
... I could move the `try/except` up there, but there are many places
I'd have to move this to, so leaving it where it is seemed like the