Lucene - Core
  1. Lucene - Core
  2. LUCENE-1465

NearSpansOrdered.getPayload does not return the payload from the minimum match span

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 2.4
    • Fix Version/s: 2.4.1, 2.9
    • Component/s: core/search
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available
    1. LUCENE-1465.patch
      7 kB
      Mark Miller
    2. LUCENE-1465.patch
      7 kB
      Mark Miller
    3. LUCENE-1465.patch
      8 kB
      Mark Miller
    4. LUCENE-1465.patch
      10 kB
      Mark Miller
    5. Test.java
      3 kB
      Jonathan Mamou

      Issue Links

        Activity

        Hide
        Mark Miller added a comment -

        Fix + test

        Show
        Mark Miller added a comment - Fix + test
        Hide
        Mark Miller added a comment -

        See LUCENE-1001 for discussion of the bug.

        Show
        Mark Miller added a comment - See LUCENE-1001 for discussion of the bug.
        Hide
        Mark Miller added a comment -

        I plan on committing this soon. This is a real deal breaker if you are trying to use the new getPayload API with ordered nearspans.

        The attached path has java 1.5 code in the test which I'll remove.

        Show
        Mark Miller added a comment - I plan on committing this soon. This is a real deal breaker if you are trying to use the new getPayload API with ordered nearspans. The attached path has java 1.5 code in the test which I'll remove.
        Hide
        Mark Miller added a comment -

        Bah. Its even worse than that. Even after you get down to a min match, it might not meet the slop requirements! You have to load the payloads and then dump them if the slop is not met.

        I don't like all this extra payload loading. Come to think of it, if you don't use the getPayload, your still paying for it! I don't have a way around it, but I don't like it. In this case, not only do you pay for loading, you also pay for loading the payloads of a bunch of possible matches that don't end up being a match!

        Over a large index with lots of hits, its a lot of payloads to load...

        I havn't thought about any of it at a high level, but I think this has to be addressed somehow...maybe you have to turn on payload collecting first, or it doesnt do it? We need something...

        but until then, I think this still has to be fixed, and we are loading them one way or another now...might as well add a few more "possible" wrong loads (this last patch added a couple as well) to make the behavior correct - somewhat useless otherwise

        Show
        Mark Miller added a comment - Bah. Its even worse than that. Even after you get down to a min match, it might not meet the slop requirements! You have to load the payloads and then dump them if the slop is not met. I don't like all this extra payload loading. Come to think of it, if you don't use the getPayload, your still paying for it! I don't have a way around it, but I don't like it. In this case, not only do you pay for loading, you also pay for loading the payloads of a bunch of possible matches that don't end up being a match! Over a large index with lots of hits, its a lot of payloads to load... I havn't thought about any of it at a high level, but I think this has to be addressed somehow...maybe you have to turn on payload collecting first, or it doesnt do it? We need something... but until then, I think this still has to be fixed, and we are loading them one way or another now...might as well add a few more "possible" wrong loads (this last patch added a couple as well) to make the behavior correct - somewhat useless otherwise
        Hide
        Mark Miller added a comment -

        That still wasn't quite right. A third test and a third fix. I am pretty sure this solves it, but my previous concerns still concern me.

        Show
        Mark Miller added a comment - That still wasn't quite right. A third test and a third fix. I am pretty sure this solves it, but my previous concerns still concern me.
        Hide
        Mark Miller added a comment -

        Thanks Jonathan and Greg!

        Show
        Mark Miller added a comment - Thanks Jonathan and Greg!
        Hide
        Jonathan Mamou added a comment -

        Hi
        It seems that the fix does not cover the case where 2 terms are indexed at the same position.
        I attach a sample program illustrating the issue. Each 2 terms are indexed at the same position.
        Best regards,
        Jonathan

        Show
        Jonathan Mamou added a comment - Hi It seems that the fix does not cover the case where 2 terms are indexed at the same position. I attach a sample program illustrating the issue. Each 2 terms are indexed at the same position. Best regards, Jonathan
        Hide
        Michael McCandless added a comment -

        Let's backport fix to 2.4 branch (for eventual 2.4.1).

        Show
        Michael McCandless added a comment - Let's backport fix to 2.4 branch (for eventual 2.4.1).
        Hide
        Mark Miller added a comment -

        Whats involved in a backport - just commit it to the 2.4 branch and thats all?

        Looks like I have to look into terms indexed at the same position first - I'll try to get to that soon.

        • Mark
        Show
        Mark Miller added a comment - Whats involved in a backport - just commit it to the 2.4 branch and thats all? Looks like I have to look into terms indexed at the same position first - I'll try to get to that soon. Mark
        Hide
        Michael McCandless added a comment -

        Whats involved in a backport - just commit it to the 2.4 branch and thats all?

        Yup. "svn merge" works well as long as the code hasn't diverged much, eg running this in a 2.4 branch checkout:

        svn merge -r(N-1):N https://svn.apache.org/repos/asf/lucene/java/trunk
        

        where N was the revision committed to trunk.

        Show
        Michael McCandless added a comment - Whats involved in a backport - just commit it to the 2.4 branch and thats all? Yup. "svn merge" works well as long as the code hasn't diverged much, eg running this in a 2.4 branch checkout: svn merge -r(N-1):N https: //svn.apache.org/repos/asf/lucene/java/trunk where N was the revision committed to trunk.
        Hide
        Mark Miller added a comment -

        This is an odd one Jonathan. Its actually for the unordered case (the others were for the ordered). I am not exactly clear on whats going on yet.

        When I look at the payloads coming back, it would seem we are get 0,7,7 when we should get 6,7,7. When I look at the offsets for the spans that I get the payloads from though - they appear correct. Its returning the payloads from the right offsets it seems, but somehow one of those payloads is from the term at position 0? Very odd. So when I debug in, it does indeed look like the first match happens at index 6...but the term offsets are start: 2147483647, end:-2147483648. What the heck? This is going to take some more time...

        Show
        Mark Miller added a comment - This is an odd one Jonathan. Its actually for the unordered case (the others were for the ordered). I am not exactly clear on whats going on yet. When I look at the payloads coming back, it would seem we are get 0,7,7 when we should get 6,7,7. When I look at the offsets for the spans that I get the payloads from though - they appear correct. Its returning the payloads from the right offsets it seems, but somehow one of those payloads is from the term at position 0? Very odd. So when I debug in, it does indeed look like the first match happens at index 6...but the term offsets are start: 2147483647, end:-2147483648. What the heck? This is going to take some more time...
        Hide
        Jonathan Mamou added a comment -

        Mark, I would expect to get 0,0,3,6,7,7 and not only 6,7,7.

        As you wrote, "a SpanAndQuery could easily be a SpanNearQuery if a huge distance was allowed." at http://www.gossamer-threads.com/lists/lucene/java-user/51983

        Show
        Jonathan Mamou added a comment - Mark, I would expect to get 0,0,3,6,7,7 and not only 6,7,7. As you wrote, "a SpanAndQuery could easily be a SpanNearQuery if a huge distance was allowed." at http://www.gossamer-threads.com/lists/lucene/java-user/51983
        Hide
        Mark Miller added a comment -

        Hmmm...I think thats true, but thats for finding 'a hit' on a document, not for finding every possible sequence of spans that could cause hit. Spans work by finding a minimum match, not greedily finding every match (which is a different algorithm).

        Show
        Mark Miller added a comment - Hmmm...I think thats true, but thats for finding 'a hit' on a document, not for finding every possible sequence of spans that could cause hit. Spans work by finding a minimum match, not greedily finding every match (which is a different algorithm).
        Hide
        Mark Miller added a comment -

        This has been backported to 2.4 and is resolved. The unresolved dangling issue is a separate issue involving a different class, and is being tracked with LUCENE-1542.

        Show
        Mark Miller added a comment - This has been backported to 2.4 and is resolved. The unresolved dangling issue is a separate issue involving a different class, and is being tracked with LUCENE-1542 .

          People

          • Assignee:
            Mark Miller
            Reporter:
            Mark Miller
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development