Lucene - Core
  1. Lucene - Core
  2. LUCENE-2013

QueryScorer and SpanRegexQuery are incompatible.

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 2.9
    • Fix Version/s: 2.9.1, 3.0
    • Component/s: modules/highlighter
    • Labels:
      None
    • Environment:

      Lucene-Java 2.9

    • Lucene Fields:
      New, Patch Available

      Description

      Since the resolution of #LUCENE-1685, users are not supposed to rewrite their queries before submitting them to QueryScorer:

      ------------------------------------------------------------------------

      r800796 | markrmiller | 2009-08-04 06:56:11 -0700 (Tue, 04 Aug 2009) | 1 line

      LUCENE-1685: The position aware SpanScorer has become the default scorer for Highlighting. The SpanScorer implementation has replaced QueryScorer and the old term highlighting QueryScorer has been renamed to QueryTermScorer. Multi-term queries are also now expanded by default. If you were previously rewritting the query for multi-term query highlighting, you should no longer do that (unless you switch to using QueryTermScorer). The SpanScorer API (now QueryScorer) has also been improved to more closely match the API of the previous QueryScorer implementation.

      ------------------------------------------------------------------------

      This is a great convenience for the most part, but it's causing me difficulties with SpanRegexQuerys, as the WeightedSpanTermExtractor uses Query.extractTerms() to collect the fields used in the query, but SpanRegexQuery does not implement this method, so highlighting any query with a SpanRegexQuery throws an UnsupportedOpertationException. If this issue is circumvented, there is still the issue of SpanRegexQuery throwing an exception when someone calls its getSpans() method.

      I can provide the patch that I am currently using, but I'm not sure that my solution is optimal. It adds two methods to SpanQuery: extractFields(Set<String> fields) which is equivalent to fields.add(getField()) except when MaskedFieldQuerys get involved, and mustBeRewrittenToGetSpans() which returns true for SpanQuery, false for SpanTermQuery, and is overridden in each composite SpanQuery to return a value depending on its components. In this way SpanRegexQuery (and any other custom SpanQuerys) do not need to be adjusted.

      Currently the collection of fields and non-weighted terms are done in a single step. In the proposed patch the WeightedSpanTerm extraction from a SpanQuery proceeds in two steps. First, if the QueryScorer's field is null, then the fields are collected from the SpanQuery using the extractFields() method. Second the terms are collected using extractTerms(), rewriting the query for each field if mustBeRewrittenToGetSpans() returns true.

      1. LUCENE-2013.patch
        5 kB
        Mark Miller
      2. lucene-2013-2009-10-28.patch
        7 kB
        Benjamin Keil
      3. lucene-2013-2009-10-28-2135.patch
        7 kB
        Benjamin Keil
      4. lucene-2013-2009-10-29-0136.patch
        7 kB
        Benjamin Keil

        Issue Links

          Activity

          Mark Thomas made changes -
          Workflow Default workflow, editable Closed status [ 12564654 ] jira [ 12584649 ]
          Mark Thomas made changes -
          Workflow jira [ 12480688 ] Default workflow, editable Closed status [ 12564654 ]
          Michael McCandless made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Mark Miller made changes -
          Status Reopened [ 4 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Mark Miller made changes -
          Resolution Fixed [ 1 ]
          Status Resolved [ 5 ] Reopened [ 4 ]
          Mark Miller made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Assignee Mark Miller [ markrmiller@gmail.com ]
          Fix Version/s 2.9.1 [ 12314295 ]
          Resolution Fixed [ 1 ]
          Benjamin Keil made changes -
          Attachment lucene-2013-2009-10-29-0136.patch [ 12423521 ]
          Mark Miller made changes -
          Fix Version/s 3.0 [ 12312889 ]
          Mark Miller made changes -
          Attachment LUCENE-2013.patch [ 12423511 ]
          Benjamin Keil made changes -
          Description Since the resolution of #LUCENE-1685, users are not supposed to rewrite their queries before submitting them to QueryScorer:

          bq.{{------------------------------------------------------------------------
          r800796 | markrmiller | 2009-08-04 06:56:11 -0700 (Tue, 04 Aug 2009) | 1 line

          LUCENE-1685: The position aware SpanScorer has become the default scorer for Highlighting. The SpanScorer implementation has replaced QueryScorer and the old term highlighting QueryScorer has been renamed to QueryTermScorer. Multi-term queries are also now expanded by default. If you were previously rewritting the query for multi-term query highlighting, you should no longer do that (unless you switch to using QueryTermScorer). The SpanScorer API (now QueryScorer) has also been improved to more closely match the API of the previous QueryScorer implementation.
          ------------------------------------------------------------------------}}

          This is a great convenience for the most part, but it's causing me difficulties with {{SpanRegexQuery}}s, as the {{WeightedSpanTermExtractor}} uses {{Query.extractTerms()}} to collect the fields used in the query, but {{SpanRegexQuery}} does not implement this method, so highlighting any query with a {{SpanRegexQuery}} throws an UnsupportedOpertationException. If this issue is circumvented, there is still the issue of {{SpanRegexQuery}} throwing an exception when someone calls its {{getSpans()}} method.

          I can provide the patch that I am currently using, but I'm not sure that my solution is optimal. It adds two methods to {{SpanQuery}}: {{extractFields(Set<String> fields)}} which is {{fields.add(getField())}} for everything except {{MaskedFieldQuery}}, and {{mustBeRewrittenToGetSpans()}} which returns {{true}} for {{SpanQuery}}, {{false}} for {{SpanTermQuery}}, and is overridden in each composite {{SpanQuery}} to return a value depending on its components. In this way {{SpanRegexQuery}} (and any other custom {{SpanQuery}}s) do not need to be adjusted.

          Currently the collection of fields and non-weighted terms are done in a single step. In the proposed patch the {{WeightedSpanTerm}} extraction from a {{SpanQuery}} proceeds in two steps. First, if the {{QueryScorer}}'s field is {{null}}, then the fields are collected from the {{SpanQuery}} using the {{extractFields()}} method. Second the terms are collected using {{extractTerms()}}, rewriting the query for each field if {{mustBeRewrittenToGetSpans()}} returns {{true}}.
          Since the resolution of #LUCENE-1685, users are not supposed to rewrite their queries before submitting them to QueryScorer:

          bq.------------------------------------------------------------------------
          bq.r800796 | markrmiller | 2009-08-04 06:56:11 -0700 (Tue, 04 Aug 2009) | 1 line
          bq.
          bq.LUCENE-1685: The position aware SpanScorer has become the default scorer for Highlighting. The SpanScorer implementation has replaced QueryScorer and the old term highlighting QueryScorer has been renamed to QueryTermScorer. Multi-term queries are also now expanded by default. If you were previously rewritting the query for multi-term query highlighting, you should no longer do that (unless you switch to using QueryTermScorer). The SpanScorer API (now QueryScorer) has also been improved to more closely match the API of the previous QueryScorer implementation.
          bq.------------------------------------------------------------------------

          This is a great convenience for the most part, but it's causing me difficulties with SpanRegexQuerys, as the WeightedSpanTermExtractor uses Query.extractTerms() to collect the fields used in the query, but SpanRegexQuery does not implement this method, so highlighting any query with a SpanRegexQuery throws an UnsupportedOpertationException. If this issue is circumvented, there is still the issue of SpanRegexQuery throwing an exception when someone calls its getSpans() method.

          I can provide the patch that I am currently using, but I'm not sure that my solution is optimal. It adds two methods to SpanQuery: extractFields(Set<String> fields) which is equivalent to fields.add(getField()) except when MaskedFieldQuerys get involved, and mustBeRewrittenToGetSpans() which returns true for SpanQuery, false for SpanTermQuery, and is overridden in each composite SpanQuery to return a value depending on its components. In this way SpanRegexQuery (and any other custom SpanQuerys) do not need to be adjusted.

          Currently the collection of fields and non-weighted terms are done in a single step. In the proposed patch the WeightedSpanTerm extraction from a SpanQuery proceeds in two steps. First, if the QueryScorer's field is null, then the fields are collected from the SpanQuery using the extractFields() method. Second the terms are collected using extractTerms(), rewriting the query for each field if mustBeRewrittenToGetSpans() returns true.
          Benjamin Keil made changes -
          Attachment lucene-2013-2009-10-28-2135.patch [ 12423494 ]
          Benjamin Keil made changes -
          Link This issue is related to LUCENE-1685 [ LUCENE-1685 ]
          Benjamin Keil made changes -
          Field Original Value New Value
          Attachment lucene-2013-2009-10-28.patch [ 12423493 ]
          Benjamin Keil created issue -

            People

            • Assignee:
              Mark Miller
              Reporter:
              Benjamin Keil
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development