|
[
Permlink
| « Hide
]
Mark Miller added a comment - 21/Nov/08 09:45 PM
Fix + test
See
I plan on committing this soon. This is a real deal breaker if you are trying to use the new getPayload API with ordered nearspans.
The attached path has java 1.5 code in the test which I'll remove. Bah. Its even worse than that. Even after you get down to a min match, it might not meet the slop requirements! You have to load the payloads and then dump them if the slop is not met.
I don't like all this extra payload loading. Come to think of it, if you don't use the getPayload, your still paying for it! I don't have a way around it, but I don't like it. In this case, not only do you pay for loading, you also pay for loading the payloads of a bunch of possible matches that don't end up being a match! Over a large index with lots of hits, its a lot of payloads to load... I havn't thought about any of it at a high level, but I think this has to be addressed somehow...maybe you have to turn on payload collecting first, or it doesnt do it? We need something... but until then, I think this still has to be fixed, and we are loading them one way or another now...might as well add a few more "possible" wrong loads (this last patch added a couple as well) to make the behavior correct - somewhat useless otherwise That still wasn't quite right. A third test and a third fix. I am pretty sure this solves it, but my previous concerns still concern me.
Hi
It seems that the fix does not cover the case where 2 terms are indexed at the same position. I attach a sample program illustrating the issue. Each 2 terms are indexed at the same position. Best regards, Jonathan Let's backport fix to 2.4 branch (for eventual 2.4.1).
Whats involved in a backport - just commit it to the 2.4 branch and thats all?
Looks like I have to look into terms indexed at the same position first - I'll try to get to that soon.
Yup. "svn merge" works well as long as the code hasn't diverged much, eg running this in a 2.4 branch checkout: svn merge -r(N-1):N https://svn.apache.org/repos/asf/lucene/java/trunk
where N was the revision committed to trunk. This is an odd one Jonathan. Its actually for the unordered case (the others were for the ordered). I am not exactly clear on whats going on yet.
When I look at the payloads coming back, it would seem we are get 0,7,7 when we should get 6,7,7. When I look at the offsets for the spans that I get the payloads from though - they appear correct. Its returning the payloads from the right offsets it seems, but somehow one of those payloads is from the term at position 0? Very odd. So when I debug in, it does indeed look like the first match happens at index 6...but the term offsets are start: 2147483647, end:-2147483648. What the heck? This is going to take some more time... Mark, I would expect to get 0,0,3,6,7,7 and not only 6,7,7.
As you wrote, "a SpanAndQuery could easily be a SpanNearQuery if a huge distance was allowed." at http://www.gossamer-threads.com/lists/lucene/java-user/51983 Hmmm...I think thats true, but thats for finding 'a hit' on a document, not for finding every possible sequence of spans that could cause hit. Spans work by finding a minimum match, not greedily finding every match (which is a different algorithm).
This has been backported to 2.4 and is resolved. The unresolved dangling issue is a separate issue involving a different class, and is being tracked with
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||