Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-7284

UnsupportedOperationException wrt SpanNearQuery with Gap (Needed for Synonym Query Expansion)

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 6.1, 6.0.1, 5.5.2, 5.6
    • Component/s: core/search
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      I am trying to support synonyms on the query side by doing
      query expansion.

      For example, the query "open webpage" can be expanded if the following
      things are synonyms:

      "open" | "go to"

      This becomes the following: (I'm using both the stop word filter and the
      stemming filter)

      spanNear(
               [
                       spanOr([Title:open, Title:go]),
                       Title:webpag
               ],
               0,
               true
      )
      

      Notice that "go to" became just "go", because apparently "to" is removed
      by the stop word filter.

      Interestingly, if you turn "go to webpage" into a phrase, you get "go ?
      webpage", but if you turn "go to" into a phrase, you just get "go",
      because apparently a trailing stop word in a PhraseQuery gets dropped.
      (there would actually be no way to represent the gap currently because
      it represents gaps implicitly via the position of the phrase tokens, and
      if there is no second token, there's no way to implicitly indicate that
      there is a gap there)

      The above query then fails to match "go to webpage", because "go to
      webpage" in the index tokenizes as "go _ webpage", and the query,
      because it lost its gap, tried to only match "go webpage".

      To try and work around that, I represent "go to" not as a phrase, but as
      a SpanNearQuery, like this:

      spanNear(
               [
                       spanOr(
                               [
                                       Title:open,
                                       spanNear([Title:go, SpanGap(:1)], 0, true),
                               ]
                       ),
                       Title:webpag
               ],
               0,
               true
      )
      

      However, when I run that query, I get the following:

      A Java exception occurred: java.lang.UnsupportedOperationException
           at 
      org.apache.lucene.search.spans.SpanNearQuery$GapSpans.positionsCost(SpanNearQuery.java:398)
           at 
      org.apache.lucene.search.spans.ConjunctionSpans.asTwoPhaseIterator(ConjunctionSpans.java:96)
           at 
      org.apache.lucene.search.spans.NearSpansOrdered.asTwoPhaseIterator(NearSpansOrdered.java:45)
           at 
      org.apache.lucene.search.spans.ScoringWrapperSpans.asTwoPhaseIterator(ScoringWrapperSpans.java:88)
           at 
      org.apache.lucene.search.ConjunctionDISI.addSpans(ConjunctionDISI.java:104)
           at 
      org.apache.lucene.search.ConjunctionDISI.intersectSpans(ConjunctionDISI.java:82)
           at 
      org.apache.lucene.search.spans.ConjunctionSpans.<init>(ConjunctionSpans.java:41)
           at 
      org.apache.lucene.search.spans.NearSpansOrdered.<init>(NearSpansOrdered.java:54)
           at 
      org.apache.lucene.search.spans.SpanNearQuery$SpanNearWeight.getSpans(SpanNearQuery.java:232)
           at 
      org.apache.lucene.search.spans.SpanWeight.scorer(SpanWeight.java:134)
           at org.apache.lucene.search.spans.SpanWeight.scorer(SpanWeight.java:38)
           at org.apache.lucene.search.Weight.bulkScorer(Weight.java:135)
      

      ... and when I look up that GapSpans class in SpanNearQuery.java, I see:

      @Override
      public float positionsCost() {
         throw new UnsupportedOperationException();
      }
      

      I asked this question on the mailing list on May 14 and was directed to submit a bug here.

      This issue is of relatively high priority for us, since this represents the most promising technique we have for supporting synonyms on top of Lucene. (since the SynonymFilter suffers serious issues wrt multi-word synonyms)

      1. LUCENE-7284.patch
        3 kB
        Alan Woodward

        Activity

        Hide
        romseygeek Alan Woodward added a comment -

        Here's a fix. The GapSpans tests weren't actually running a search, just pulling Spans directly from a weight, so this bug didn't get picked up when positionsCost() was added.

        Show
        romseygeek Alan Woodward added a comment - Here's a fix. The GapSpans tests weren't actually running a search, just pulling Spans directly from a weight, so this bug didn't get picked up when positionsCost() was added.
        Hide
        danielbigham Daniel Bigham added a comment -

        Wow, I'm really impressed with the turnaround time on this. Thanks so much Alan.

        Show
        danielbigham Daniel Bigham added a comment - Wow, I'm really impressed with the turnaround time on this. Thanks so much Alan.
        Hide
        danielbigham Daniel Bigham added a comment -

        By the way, what is the right protocol for taking a fix like this and getting updated Lucene 5.5 JARs? I presume it's:

        1. Download the 5.5 source.
        2. Apply the patch.
        3. Build Lucene.
        4. Use the updated JARs.

        Is that the right set of steps?

        Show
        danielbigham Daniel Bigham added a comment - By the way, what is the right protocol for taking a fix like this and getting updated Lucene 5.5 JARs? I presume it's: 1. Download the 5.5 source. 2. Apply the patch. 3. Build Lucene. 4. Use the updated JARs. Is that the right set of steps?
        Hide
        romseygeek Alan Woodward added a comment -

        Yes, that should work. The patch is for 6.1, but I don't think anything significant here has changed since 5.5 so it ought to apply cleanly.

        Show
        romseygeek Alan Woodward added a comment - Yes, that should work. The patch is for 6.1, but I don't think anything significant here has changed since 5.5 so it ought to apply cleanly.
        Hide
        danielbigham Daniel Bigham added a comment -

        Confirmed the fix. My synonym expansion strategy now appears to work as hoped. A big thank you to Alan!

        Show
        danielbigham Daniel Bigham added a comment - Confirmed the fix. My synonym expansion strategy now appears to work as hoped. A big thank you to Alan!
        Hide
        romseygeek Alan Woodward added a comment -

        Hi Daniel, we normally wait till the fix is committed before resolving the issue - I'll probably commit tomorrow morning. Thanks for testing!

        Show
        romseygeek Alan Woodward added a comment - Hi Daniel, we normally wait till the fix is committed before resolving the issue - I'll probably commit tomorrow morning. Thanks for testing!
        Hide
        danielbigham Daniel Bigham added a comment -

        Whoops, my apologies.

        Show
        danielbigham Daniel Bigham added a comment - Whoops, my apologies.
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit cfc13f5b67e9d34c3bf3a6f3773b47f05e2b4527 in lucene-solr's branch refs/heads/master from Alan Woodward
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=cfc13f5 ]

        LUCENE-7284: GapSpans needs to implement positionsCost()

        Show
        jira-bot ASF subversion and git services added a comment - Commit cfc13f5b67e9d34c3bf3a6f3773b47f05e2b4527 in lucene-solr's branch refs/heads/master from Alan Woodward [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=cfc13f5 ] LUCENE-7284 : GapSpans needs to implement positionsCost()
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 53d96705e9d906dabf1ac03182b5b8a4c184c441 in lucene-solr's branch refs/heads/branch_6x from Alan Woodward
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=53d9670 ]

        LUCENE-7284: GapSpans needs to implement positionsCost()

        Show
        jira-bot ASF subversion and git services added a comment - Commit 53d96705e9d906dabf1ac03182b5b8a4c184c441 in lucene-solr's branch refs/heads/branch_6x from Alan Woodward [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=53d9670 ] LUCENE-7284 : GapSpans needs to implement positionsCost()
        Hide
        romseygeek Alan Woodward added a comment -

        Thanks Daniel!

        Show
        romseygeek Alan Woodward added a comment - Thanks Daniel!
        Hide
        steve_rowe Steve Rowe added a comment -

        Reopening to backport to 6.0.1.

        Show
        steve_rowe Steve Rowe added a comment - Reopening to backport to 6.0.1.
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit a4732370a7fa217800307c14c4cc398f50a7b67d in lucene-solr's branch refs/heads/branch_6_0 from Alan Woodward
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=a473237 ]

        LUCENE-7284: GapSpans needs to implement positionsCost()

        Show
        jira-bot ASF subversion and git services added a comment - Commit a4732370a7fa217800307c14c4cc398f50a7b67d in lucene-solr's branch refs/heads/branch_6_0 from Alan Woodward [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=a473237 ] LUCENE-7284 : GapSpans needs to implement positionsCost()
        Hide
        steve_rowe Steve Rowe added a comment -

        Bulk close issues included in the 6.0.1 release.

        Show
        steve_rowe Steve Rowe added a comment - Bulk close issues included in the 6.0.1 release.
        Hide
        steve_rowe Steve Rowe added a comment -

        Reopening to backport to 5.6 and 5.5.2.

        Show
        steve_rowe Steve Rowe added a comment - Reopening to backport to 5.6 and 5.5.2.
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 3e5832291b807a9b9b6271d8fd990678f27a83c4 in lucene-solr's branch refs/heads/branch_5_5 from Alan Woodward
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=3e58322 ]

        LUCENE-7284: GapSpans needs to implement positionsCost()

        Show
        jira-bot ASF subversion and git services added a comment - Commit 3e5832291b807a9b9b6271d8fd990678f27a83c4 in lucene-solr's branch refs/heads/branch_5_5 from Alan Woodward [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=3e58322 ] LUCENE-7284 : GapSpans needs to implement positionsCost()
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit fa9940b3e3ab9955a26dfe30839d591b7703a8c4 in lucene-solr's branch refs/heads/branch_5x from Alan Woodward
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=fa9940b ]

        LUCENE-7284: GapSpans needs to implement positionsCost()

        Show
        jira-bot ASF subversion and git services added a comment - Commit fa9940b3e3ab9955a26dfe30839d591b7703a8c4 in lucene-solr's branch refs/heads/branch_5x from Alan Woodward [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=fa9940b ] LUCENE-7284 : GapSpans needs to implement positionsCost()

          People

          • Assignee:
            romseygeek Alan Woodward
            Reporter:
            danielbigham Daniel Bigham
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development