Lucene - Core
  1. Lucene - Core
  2. LUCENE-2123

Move FuzzyQuery rewrite as separate RewriteMode into MTQ, was: Highlighter fails to highlight FuzzyQuery

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 4.0-ALPHA
    • Fix Version/s: 4.0-ALPHA
    • Component/s: modules/highlighter
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      As FuzzyQuery does not allow to change the rewrite mode, highlighter fails with UOE in flex since LUCENE-2110, because it changes the rewrite mode to Boolean query. The fix is: Allow MTQ to change rewrite method and make FUZZY_REWRITE public for that.

      The rewrite mode will live in MTQ as TOP_TERMS_SCORING_BOOLEAN_REWRITE. Also the code will be refactored to make heavy reuse of term enumeration code and only plug in the PQ for filtering the top terms.

      1. LUCENE-2123.patch
        17 kB
        Uwe Schindler
      2. LUCENE-2123.patch
        16 kB
        Uwe Schindler
      3. LUCENE-2123.patch
        16 kB
        Uwe Schindler
      4. LUCENE-2123.patch
        17 kB
        Uwe Schindler
      5. LUCENE-2123-flex.patch
        19 kB
        Uwe Schindler
      6. LUCENE-2123-flex.patch
        19 kB
        Uwe Schindler
      7. LUCENE-2123-flex.patch
        19 kB
        Uwe Schindler
      8. LUCENE-2123-flex.patch
        20 kB
        Uwe Schindler
      9. LUCENE-2123-flex.patch
        20 kB
        Uwe Schindler
      10. LUCENE-2123-flex.patch
        19 kB
        Uwe Schindler
      11. LUCENE-2123-flex.patch
        13 kB
        Uwe Schindler
      12. LUCENE-2123-lucene30.patch
        6 kB
        Uwe Schindler
      13. LUCENE-2123-lucene30.patch
        4 kB
        Uwe Schindler
      14. LUCENE-2123-lucene30.patch
        4 kB
        Uwe Schindler
      15. LUCENE-2123-lucene30.patch
        3 kB
        Uwe Schindler
      16. LUCENE-2123-ScoreTermRemoval-trunk.patch
        7 kB
        Uwe Schindler
      17. LUCENE-2123-ScoreTermRemoval-trunk.patch
        2 kB
        Uwe Schindler

        Issue Links

          Activity

          Hide
          Simon Willnauer added a comment -

          Could we make FuzzyRewrite a subclass of BooleanQueryRewriteMethod and add a simple check like

          if(!mtq.getRewriteMethod() instanceof BooleanRewrite){
            mtq.setRewriteMethod(BooleanRewriteMethod)
          }
          

          As it rewrites to booleanquery it seems to be ok to subclass though.

          Yet another thing, you throw unsupportedOperationException in FuzzyQuery#setRewriteMethod, we should at least document in MTQ that this is an optional operation. Something like

          /**
            * ... (optional operation)
            * @throws UnsupportedOperationException if this method is not supported by the MultiTermQuery
            */
          
          Show
          Simon Willnauer added a comment - Could we make FuzzyRewrite a subclass of BooleanQueryRewriteMethod and add a simple check like if (!mtq.getRewriteMethod() instanceof BooleanRewrite){ mtq.setRewriteMethod(BooleanRewriteMethod) } As it rewrites to booleanquery it seems to be ok to subclass though. Yet another thing, you throw unsupportedOperationException in FuzzyQuery#setRewriteMethod, we should at least document in MTQ that this is an optional operation. Something like /** * ... (optional operation) * @ throws UnsupportedOperationException if this method is not supported by the MultiTermQuery */
          Hide
          Uwe Schindler added a comment -

          Could we make FuzzyRewrite a subclass of BooleanQueryRewriteMethod and add a simple check like

          The problem is that also ConstantScoreBooleanRewrite subclasses BooleanQueryRewriteMethod.

          Yet another thing, you throw unsupportedOperationException in FuzzyQuery#setRewriteMethod, we should at least document in MTQ that this is an optional operation. Something like

          This issue will remove this an make FUZZY_REWRITE public and maybe move to MTQ (because it is also useful for other boosting term lists to only put the highest ranking erms into the BQ). The name could be something like HIGHEST_SCORING_BOOLEAN_QUERY_REWRITE (brr).

          Show
          Uwe Schindler added a comment - Could we make FuzzyRewrite a subclass of BooleanQueryRewriteMethod and add a simple check like The problem is that also ConstantScoreBooleanRewrite subclasses BooleanQueryRewriteMethod. Yet another thing, you throw unsupportedOperationException in FuzzyQuery#setRewriteMethod, we should at least document in MTQ that this is an optional operation. Something like This issue will remove this an make FUZZY_REWRITE public and maybe move to MTQ (because it is also useful for other boosting term lists to only put the highest ranking erms into the BQ). The name could be something like HIGHEST_SCORING_BOOLEAN_QUERY_REWRITE (brr).
          Hide
          Uwe Schindler added a comment -

          Here a refactoring of the rewrite modes in Flex. I'll port to trunk, too.

          FuzzyQuery now uses per default TOP_TERMS_SCORING_BOOLEAN_REWRITE which is part of MTQ and can now also be used by e.g. MoreLikeThis.

          Show
          Uwe Schindler added a comment - Here a refactoring of the rewrite modes in Flex. I'll port to trunk, too. FuzzyQuery now uses per default TOP_TERMS_SCORING_BOOLEAN_REWRITE which is part of MTQ and can now also be used by e.g. MoreLikeThis.
          Hide
          Uwe Schindler added a comment -

          More refactoring. No also AUTO_REWRITE uses the new TermCollector. It gets less and less code.

          Show
          Uwe Schindler added a comment - More refactoring. No also AUTO_REWRITE uses the new TermCollector. It gets less and less code.
          Hide
          Uwe Schindler added a comment -

          Now i also made the strange anonymous inner class a named inner class to get rid of the strange boolean holder, implemented by an array.

          Show
          Uwe Schindler added a comment - Now i also made the strange anonymous inner class a named inner class to get rid of the strange boolean holder, implemented by an array.
          Hide
          Uwe Schindler added a comment -

          Trunk patch comes soon.

          Show
          Uwe Schindler added a comment - Trunk patch comes soon.
          Hide
          Uwe Schindler added a comment -

          Here the final patches with updated JavaDocs. I want to apply them in this form to trunk and flex. If nobody objects I will do this tomorrow.

          WIth this patch, FuzzyQuery will always highlight correctly, as it can be switched to boolean query rewrite mode.

          Show
          Uwe Schindler added a comment - Here the final patches with updated JavaDocs. I want to apply them in this form to trunk and flex. If nobody objects I will do this tomorrow. WIth this patch, FuzzyQuery will always highlight correctly, as it can be switched to boolean query rewrite mode.
          Hide
          Uwe Schindler added a comment -

          Here is the code as discussed on IRC:
          It fixes the braindead LUCENE-504 code

          Show
          Uwe Schindler added a comment - Here is the code as discussed on IRC: It fixes the braindead LUCENE-504 code
          Hide
          Uwe Schindler added a comment -

          So the last patch for today.

          I optimized the PQ to reuse the ScoreTerm instance when the PQ is full. I think for current FuzzyQuery the rewrite modes are now as best as possible. The tests pass that already test the overflow of the PQ by setting BQ.maxClauseCount to a very low value.

          Show
          Uwe Schindler added a comment - So the last patch for today. I optimized the PQ to reuse the ScoreTerm instance when the PQ is full. I think for current FuzzyQuery the rewrite modes are now as best as possible. The tests pass that already test the overflow of the PQ by setting BQ.maxClauseCount to a very low value.
          Hide
          Uwe Schindler added a comment -

          After sleeping one more night, I added code to not even put the termsinto the PQ, when they are not competitive. More support for automaton query will come only in flex with LUCENE-2140.

          I like to commit this during the day. Thanks for all the support and interesting discussions.

          Show
          Uwe Schindler added a comment - After sleeping one more night, I added code to not even put the termsinto the PQ, when they are not competitive. More support for automaton query will come only in flex with LUCENE-2140 . I like to commit this during the day. Thanks for all the support and interesting discussions.
          Hide
          Michael McCandless added a comment -

          Latest patch looks great Uwe!

          Show
          Michael McCandless added a comment - Latest patch looks great Uwe!
          Hide
          Uwe Schindler added a comment -

          Committed in trunk and flex!

          WARNING: Please do not merge revision #889181 to flex!

          I'll will not merge this to 3.0 branch, as the ScoreTerm is protected and this change need to invert th PQ order, which would break BW compatibility.

          Show
          Uwe Schindler added a comment - Committed in trunk and flex! WARNING: Please do not merge revision #889181 to flex! I'll will not merge this to 3.0 branch, as the ScoreTerm is protected and this change need to invert th PQ order, which would break BW compatibility.
          Hide
          Uwe Schindler added a comment -

          I already closed this issue, but still open is migration to 3.0, where the PQ is somehow broken (can consume too much RAM). There are two possibilities to merge:

          • Apply this patch to 3.0 -> new features, ok for 3.0 branch
          • Only fix PQ issue in FuzzyQuery -> Problem: ScoreDoc is protected and there is the need to change the comparator (inverse). In my opinion the ScoreTerm should be a private class at all (like it is in MTQ now). But we cannot change it (BW compatibility). A solution could be to ignore the comparator and give the inverse comparator in the PQ ctor as param. In trunk, the ScoreTerm class is deprecated and dead code, but has the original comparator.

          What to do?

          Show
          Uwe Schindler added a comment - I already closed this issue, but still open is migration to 3.0, where the PQ is somehow broken (can consume too much RAM). There are two possibilities to merge: Apply this patch to 3.0 -> new features, ok for 3.0 branch Only fix PQ issue in FuzzyQuery -> Problem: ScoreDoc is protected and there is the need to change the comparator (inverse). In my opinion the ScoreTerm should be a private class at all (like it is in MTQ now). But we cannot change it (BW compatibility). A solution could be to ignore the comparator and give the inverse comparator in the PQ ctor as param. In trunk, the ScoreTerm class is deprecated and dead code, but has the original comparator. What to do?
          Hide
          Uwe Schindler added a comment -

          Solution 2 for 3.0 branch that inverts the ScoreTerm comparator with an additional Comparator.

          Show
          Uwe Schindler added a comment - Solution 2 for 3.0 branch that inverts the ScoreTerm comparator with an additional Comparator.
          Hide
          Uwe Schindler added a comment -

          Here the version with BW break. The inner class ScoreTerm in Fuzzy query was never intended to be public. If somebody used it, code will break, but that is not likely to be the case.

          Show
          Uwe Schindler added a comment - Here the version with BW break. The inner class ScoreTerm in Fuzzy query was never intended to be public. If somebody used it, code will break, but that is not likely to be the case.
          Hide
          Uwe Schindler added a comment -

          Removed unused include.

          Show
          Uwe Schindler added a comment - Removed unused include.
          Hide
          Uwe Schindler added a comment -

          3.0 branch + scoring test. Committed revision: 889567

          Show
          Uwe Schindler added a comment - 3.0 branch + scoring test. Committed revision: 889567
          Hide
          Uwe Schindler added a comment -

          Here the same change for trunk. It also contains a test that checks a term range (which has no boost) with the fuzzy rewrite mode and tests that the PQ correctly only returns the lower terms.

          Show
          Uwe Schindler added a comment - Here the same change for trunk. It also contains a test that checks a term range (which has no boost) with the fuzzy rewrite mode and tests that the PQ correctly only returns the lower terms.
          Hide
          Uwe Schindler added a comment -

          Better test for the PQ tie-break case.

          I'll commit soon and also merge to flex (in flex more changes in javadocs needed).

          Show
          Uwe Schindler added a comment - Better test for the PQ tie-break case. I'll commit soon and also merge to flex (in flex more changes in javadocs needed).
          Hide
          Uwe Schindler added a comment -

          To test the comparator and inject a bug:
          remove:

          // ignore uncompetetive hits
          if (stQueue.size() >= maxSize && boost <= stQueue.peek().boost)
            return true;
          

          and inverse the comparator's term order:

          public int compareTo(ScoreTerm other) {
            if (this.boost == other.boost)
               return *-*other.term.compareTo(this.term);
            else
               return Float.compare(this.boost, other.boost);
          }
          
          Show
          Uwe Schindler added a comment - To test the comparator and inject a bug: remove: // ignore uncompetetive hits if (stQueue.size() >= maxSize && boost <= stQueue.peek().boost) return true ; and inverse the comparator's term order: public int compareTo(ScoreTerm other) { if ( this .boost == other.boost) return *-*other.term.compareTo( this .term); else return Float .compare( this .boost, other.boost); }

            People

            • Assignee:
              Uwe Schindler
              Reporter:
              Uwe Schindler
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development