|
Attached sloppy_phrase_java.patch.txt is fixing the failing new tests.
This also includes the skipTo() bug from issue 697. The fix does not guarantee that document A B C B A would score "A B C"~4 and "C B A"~4 the same. This fix comes with a performance cost: about 15% degradation in CPU activity of sloppy phrase scoring, as the attcahed perf logs show. .......Operation..........runCnt...recsPerRun.....rec/s..elapsedSec I think that in a real life scenario - real index, real documents, real queries - this extra CPU will be shaded by IO, but I also belive we should refrain from slowing down search, so, unhappy with this degradation (anyone would Perf test was done using the task benchmark framework (see issue 675), The logs show also the queries that were searched. All tests pass with new code. Great investigations Doron!
Personally I'm more concerned with (1) than (2). Was the fix for one issue more responsible for the performance loss than the other? I have had similar concerns when I implemented NearSpansOrdered.java and NearSpansUnordered.java,
which are in the trunk now. These match somewhat different phrases, but it would be good to ensure that the same matches score the same for spans and phrases. The change to fix case 2 was not the main performance degradation cause.
I agree with Yonik that case 2 is much more important than case 1. Cost of this fix dropped from 15% more CPU time to about 3%. .....Operation..........runCnt...recsPerRun...rec/s..elapsedSec....avgUsedMem....avgTotalMem Attached sloppy_phrase.patch2.txt has the updated fix, including both java and test parts. Some of the asserts in the new tests were commented out cause the patch takes decision not to fix case 1 above. Also attaching the updates perf test logs - res-search-orig2.log and res-search-new2.log. I did not compare scoring of similar cases between sloppy phrase and near spans and Paul suggested - perhaps next week - not sure this should hold progress with this issue. There is no need to hold up this issue for span phrases.
Perhaps a good way to get the spans and the phrases work well together is by adding a getSpans() to PhraseQuery, or by introduction of a SpanPhraseQuery. But this would better be done at a new jira issue. There is a bug in my recent patch (sloppy_phrase.patch2.txt):
Test case - testNonExistingWrappedPhrase - was extended.
A bug in the patch (described above) was fixed. All tests pass. Doron, sounds like this is ripe for a commit now to take care of both this and
Need to see if the parts of the test (in QueryUtils) that were disabled by
Changing the title to match what we decided to fix here.
Fixed.
Attaching for any future reference the fix that was applied for this.
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
These new tests currently fail.
This too currently fails.