> I tried to make sense of the existing NearSpans implimentation over the weekend ... i did not succeed.
> I still haven't had a cahnce to look at the new one in
LUCENE-413 but i wnat to clarify something you said..
For the unordered case the priority queue implementation over the subspans in the current NearSpans is fine.
For the ordered case I could not figure out how to deal with the priority queue and the restriction on
ordering at the same time. This is precisely what the bug above shows.
> >>> The NearSpansOrdered there differs from the current version in that it does not
> >>> match overlapping subspans, but it passes all current test cases including TestNearSpans here.
> ...should I understand you to mean then that the current implimentaion of NearSpans does work
> correctly with overlapping sub-spans ... there just isnt' a test for it?
For ordered queries, it might work with overlapping sub-spans on some cases.
However, I'd expect any test to run into the bug above for some other ordered cases.
> that seems like important enough behavior that we wouldn't want to break it to fix this bug.
Given the bug, I hope nothing depends on it.
> Even if matching on overlapping subspans wasn't an intentional feature of NearSpans – the fact that it
> currently works and the documentation is silent on the issue suggests to me that it should remain supported.
That can probably be done by modifying the NearSpansOrdered of
LUCENE-413 at lines 133-138 and at
line 167 where the end of the previous (possibly matching) subspans is compared to the start of the next one.
This could compare the start with the start instead.
I don't know what precisely is the intended behaviour, so I can't say whether these changed comparisons
should allow equality or not. Perhaps the ends should be compared when the starts are equal,
just like it is done in the priority queue for the unordered case.