Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.3
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      It is sometimes desirable to ignore differences between index statistics of several terms so that they produce the same scores, for instance if you resolve synonyms at search time or if you want to search across several fields. Elasticsearch has been using this approach for its multi_match query for some time now.

      We already blend statistics in TopTermsBlendedFreqScoringRewrite (used by FuzzyQuery) but it could be helpful to have a dedicated query to choose manually which terms to blend stats from.

      1. LUCENE-6695.patch
        24 kB
        Adrien Grand
      2. LUCENE-6695.patch
        24 kB
        Adrien Grand

        Activity

        Hide
        Adrien Grand added a comment -

        Here is a patch: it computes the aggregated doc freq from several terms as the maximum doc freq, and the total term freq as the sum of the total term freqs of individual terms.

        I put the query in lucene/core so that TopTermsBlendedFreqScoringRewrite could reuse it and marked it as experimental, but if someone is not comfortable with it I can revert the changes to TopTermsBlendedFreqScoringRewrite and move this query to the sandbox.

        Show
        Adrien Grand added a comment - Here is a patch: it computes the aggregated doc freq from several terms as the maximum doc freq, and the total term freq as the sum of the total term freqs of individual terms. I put the query in lucene/core so that TopTermsBlendedFreqScoringRewrite could reuse it and marked it as experimental, but if someone is not comfortable with it I can revert the changes to TopTermsBlendedFreqScoringRewrite and move this query to the sandbox.
        Hide
        Uwe Schindler added a comment -

        Would it be not better to use IndexSearcher.rewrite() inside ComplexPhraseQueryParser? This one does the rewrite loop correctly, so we don't duplicate code: Query rewritten= new IndexSearcher(reader).rewrite(query);

        But I like your funny for-loop

        Otherwise I am fine to have it in core (we have the logic already there, so your proposal to replace the fuzzy rewrite is fine).

        Show
        Uwe Schindler added a comment - Would it be not better to use IndexSearcher.rewrite() inside ComplexPhraseQueryParser? This one does the rewrite loop correctly, so we don't duplicate code: Query rewritten= new IndexSearcher(reader).rewrite(query); But I like your funny for-loop Otherwise I am fine to have it in core (we have the logic already there, so your proposal to replace the fuzzy rewrite is fine).
        Hide
        Adrien Grand added a comment -

        Thanks Uwe, I'll do that. Eventually I would like to remove this rewrite() method from the public API of Query, it should really be an implementation detail of createWeight!

        Show
        Adrien Grand added a comment - Thanks Uwe, I'll do that. Eventually I would like to remove this rewrite() method from the public API of Query, it should really be an implementation detail of createWeight!
        Hide
        Adrien Grand added a comment -

        Updated patch to apply Uwe's suggestion.

        By the way, the funny loop is the same in IndexSearcher.rewrite.

        Show
        Adrien Grand added a comment - Updated patch to apply Uwe's suggestion. By the way, the funny loop is the same in IndexSearcher.rewrite.
        Hide
        Uwe Schindler added a comment - - edited

        Eventually I would like to remove this rewrite() method from the public API of Query, it should really be an implementation detail of createWeight!

        I would make rewrite() a protected method in Query and let the default impl of createWeight() call it. If a query does not implement createWeight (therefore, default impl is used), that one does the rewrite loop and calls createWeight on the final one. Currently createWeight throws UOE, this would also repair that. Of course default rewrite impl would need to be fixed (and rewrite should throw UOE by default). A query that implement createWeight, would not call rewrite.

        Alternatively add a "RewriteableQuery" with a final createWeight doing the loop) that has an abstract rewrite() method...

        By that no "consumer" of the query would ever call rewrite, they just call createWeight() before execution.

        (this are just ideas, maybe let's create separate issue)

        Show
        Uwe Schindler added a comment - - edited Eventually I would like to remove this rewrite() method from the public API of Query, it should really be an implementation detail of createWeight! I would make rewrite() a protected method in Query and let the default impl of createWeight() call it. If a query does not implement createWeight (therefore, default impl is used), that one does the rewrite loop and calls createWeight on the final one. Currently createWeight throws UOE, this would also repair that. Of course default rewrite impl would need to be fixed (and rewrite should throw UOE by default). A query that implement createWeight, would not call rewrite. Alternatively add a "RewriteableQuery" with a final createWeight doing the loop) that has an abstract rewrite() method... By that no "consumer" of the query would ever call rewrite, they just call createWeight() before execution. (this are just ideas, maybe let's create separate issue)
        Hide
        Adrien Grand added a comment -

        I would make rewrite() a protected method

        I think we should do that. The only purpose of rewriting a query is to create a Weight, so we need to reduce the visibility of this method or remove it (trickier). Also I think it's very trappy today that createWeight is not functional unless you have a rewritten query.

        I think there are several issues with RewriteableQuery, for instance compound queries would not be able anymore to rewrite their inner queries, and also we have several queries that implement both rewrite() and createWeight().

        Show
        Adrien Grand added a comment - I would make rewrite() a protected method I think we should do that. The only purpose of rewriting a query is to create a Weight, so we need to reduce the visibility of this method or remove it (trickier). Also I think it's very trappy today that createWeight is not functional unless you have a rewritten query. I think there are several issues with RewriteableQuery, for instance compound queries would not be able anymore to rewrite their inner queries, and also we have several queries that implement both rewrite() and createWeight().
        Hide
        Adrien Grand added a comment -

        Oops my bad, I thought we could still rewrite sub queries if the method was protected, but it would only work for queries defined in the oal.search package.

        Show
        Adrien Grand added a comment - Oops my bad, I thought we could still rewrite sub queries if the method was protected, but it would only work for queries defined in the oal.search package.
        Hide
        ASF subversion and git services added a comment -

        Commit 1692848 from Adrien Grand in branch 'dev/trunk'
        [ https://svn.apache.org/r1692848 ]

        LUCENE-6695: Added BlendedTermQuery.

        Show
        ASF subversion and git services added a comment - Commit 1692848 from Adrien Grand in branch 'dev/trunk' [ https://svn.apache.org/r1692848 ] LUCENE-6695 : Added BlendedTermQuery.
        Hide
        ASF subversion and git services added a comment -

        Commit 1692864 from Adrien Grand in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1692864 ]

        LUCENE-6695: Added BlendedTermQuery.

        Show
        ASF subversion and git services added a comment - Commit 1692864 from Adrien Grand in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1692864 ] LUCENE-6695 : Added BlendedTermQuery.
        Hide
        Shalin Shekhar Mangar added a comment -

        Bulk close for 5.3.0 release

        Show
        Shalin Shekhar Mangar added a comment - Bulk close for 5.3.0 release

          People

          • Assignee:
            Adrien Grand
            Reporter:
            Adrien Grand
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development