Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.1, 6.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Now that we have weight-level control of whether scoring is needed or not, we could add a new clause type to BooleanQuery. It would behave like MUST exept that it would not participate in scoring.

      Why do we need it given that we already have FilteredQuery? The idea is that by having a single query that performs conjunctions, we could potentially take better decisions. It's not ready to replace FilteredQuery yet as FilteredQuery has handling of random-access filters that BooleanQuery doesn't, but it's a first step towards that direction and eventually FilteredQuery would just rewrite to a BooleanQuery.

      I've been calling this new clause type FILTER so far, but feel free to propose a better name.

      1. LUCENE-6227.patch
        30 kB
        Adrien Grand
      2. LUCENE-6227.patch
        109 kB
        Adrien Grand
      3. LUCENE-6227.patch
        27 kB
        Adrien Grand
      4. LUCENE-6227.patch
        24 kB
        Adrien Grand
      5. LUCENE-6227.patch
        24 kB
        Adrien Grand

        Activity

        Hide
        Adrien Grand added a comment -

        Patch. ConjunctionScorer now takes two sets of scorers, a first one which contains required clauses and is used for advancing to the next match, and another one which only takes the scoring clauses and is used when score() is called.

        Show
        Adrien Grand added a comment - Patch. ConjunctionScorer now takes two sets of scorers, a first one which contains required clauses and is used for advancing to the next match, and another one which only takes the scoring clauses and is used when score() is called.
        Hide
        Michael McCandless added a comment -

        This looks wonderful! Does it mean we can remove BooleanFilter? Maybe TermsFilter? Or that can come later; this is already an awesome step.

        Minor silly Englishism: "participate to" -> "participate in"

        Show
        Michael McCandless added a comment - This looks wonderful! Does it mean we can remove BooleanFilter? Maybe TermsFilter? Or that can come later; this is already an awesome step. Minor silly Englishism: "participate to" -> "participate in"
        Hide
        Uwe Schindler added a comment -

        JUHUUUUUUU!

        Show
        Uwe Schindler added a comment - JUHUUUUUUU!
        Hide
        Uwe Schindler added a comment -

        If we now manage to nuke the whole Filter class (or make it just a specialization of Query that supports caching + random access), we can clean up more! This is a great step forwards.

        As far as I see, currently you cannot add a Filter to BooleanQuery, you still have to wrap it - because BooleanClause needs a "Query" object. This would be solved if above becomes true.

        Show
        Uwe Schindler added a comment - If we now manage to nuke the whole Filter class (or make it just a specialization of Query that supports caching + random access), we can clean up more! This is a great step forwards. As far as I see, currently you cannot add a Filter to BooleanQuery, you still have to wrap it - because BooleanClause needs a "Query" object. This would be solved if above becomes true.
        Hide
        Adrien Grand added a comment -

        Yes. I don't have concrete plans yet, but the only thing that filters have but queries don't is random-access. For instance FilteredQuery today has a not-so-bad way to execute doc-values filters by advancing the query first and then checking the filter, we don't have this on BooleanQuery today. But I think something like LUCENE-6198 could be a better alternative, ie. doc-values filters would return a match-all iterator and do the heavy work in confirm() and boolean queries would work on it just fine. So once this kind of stuff is in, I think we'll be able to remove filters?

        Show
        Adrien Grand added a comment - Yes. I don't have concrete plans yet, but the only thing that filters have but queries don't is random-access. For instance FilteredQuery today has a not-so-bad way to execute doc-values filters by advancing the query first and then checking the filter, we don't have this on BooleanQuery today. But I think something like LUCENE-6198 could be a better alternative, ie. doc-values filters would return a match-all iterator and do the heavy work in confirm() and boolean queries would work on it just fine. So once this kind of stuff is in, I think we'll be able to remove filters?
        Hide
        Adrien Grand added a comment -

        New patch with fixed frenglishism. I tried to remove BooleanFilter and replace it with BooleanQuery but it makes lucene/facets angry because the filter does not implement bits() anymore (look for "fastMatchFilter does not implement DocIdSet.bits" in DoubleRange) so I'd like to delay it to another issue.

        Show
        Adrien Grand added a comment - New patch with fixed frenglishism. I tried to remove BooleanFilter and replace it with BooleanQuery but it makes lucene/facets angry because the filter does not implement bits() anymore (look for "fastMatchFilter does not implement DocIdSet.bits" in DoubleRange) so I'd like to delay it to another issue.
        Hide
        Michael McCandless added a comment -

        New patch with fixed frenglishism.

        Thanks, love that term frenglishism

        I tried to remove BooleanFilter and replace it with BooleanQuery but it makes lucene/facets angry because the filter does not implement bits()

        Oh, grrr .... +1 to delay.

        Show
        Michael McCandless added a comment - New patch with fixed frenglishism. Thanks, love that term frenglishism I tried to remove BooleanFilter and replace it with BooleanQuery but it makes lucene/facets angry because the filter does not implement bits() Oh, grrr .... +1 to delay.
        Hide
        Robert Muir added a comment -

        ConjunctionScorer's required bucket could now be DISI, in that it doesn't need Scorer right? It only scores 'requiredScoring'.

        As far as the new Occur, i like it, but I think its a little confusing that it has the same toString() impl as MUST. Can these be different? It would be nice to think about e.g. adding queryparser support for this Occur in the future.

        Show
        Robert Muir added a comment - ConjunctionScorer's required bucket could now be DISI, in that it doesn't need Scorer right? It only scores 'requiredScoring'. As far as the new Occur, i like it, but I think its a little confusing that it has the same toString() impl as MUST. Can these be different? It would be nice to think about e.g. adding queryparser support for this Occur in the future.
        Hide
        Adrien Grand added a comment -

        New patch:

        • takes '#' for filter clauses instead of '+' like MUST
        • ConjunctionScorer takes DocIdSetIterators instead of Scorers for the intersection part.
        Show
        Adrien Grand added a comment - New patch: takes '#' for filter clauses instead of '+' like MUST ConjunctionScorer takes DocIdSetIterators instead of Scorers for the intersection part.
        Hide
        Hoss Man added a comment -

        two tangential thoughts/questions...

        1) From an API/conceptual standpoint, does it make more sense for this to be a new "Occur" instance (the Occur.FILTER here) or would it make more sense for this to be a property on BooleanClause that could be set to true with either MUST or MUST_NOT clauses?

        2) Assuming it's a new Occur.FILTER, should we plan on renaming Occur.MUST_NOT to something like Occur.FILTER_NEGATION since (unless i'm missunderstanding something) the "non-scoring" semantics of Occur.FILTER and Occur.MUST_NOT are basicly the inverse of each other right? so it seems like we should probably do something ot make it more clear that Occur.MUST_NOT has more in common with FILTER then with MUST ?

        Show
        Hoss Man added a comment - two tangential thoughts/questions... 1) From an API/conceptual standpoint, does it make more sense for this to be a new "Occur" instance (the Occur.FILTER here) or would it make more sense for this to be a property on BooleanClause that could be set to true with either MUST or MUST_NOT clauses? 2) Assuming it's a new Occur.FILTER, should we plan on renaming Occur.MUST_NOT to something like Occur.FILTER_NEGATION since (unless i'm missunderstanding something) the "non-scoring" semantics of Occur.FILTER and Occur.MUST_NOT are basicly the inverse of each other right? so it seems like we should probably do something ot make it more clear that Occur.MUST_NOT has more in common with FILTER then with MUST ?
        Hide
        Adrien Grand added a comment -

        I agree that the names lack symmetry and it would be nice to fix it... I like the idea of renaming MUST_NOT to something like FILTER_NEGATION to make clear that it does not score. Or maybe even shorter, eg. FILTER_NOT?

        Show
        Adrien Grand added a comment - I agree that the names lack symmetry and it would be nice to fix it... I like the idea of renaming MUST_NOT to something like FILTER_NEGATION to make clear that it does not score. Or maybe even shorter, eg. FILTER_NOT?
        Hide
        Adrien Grand added a comment -

        Here is a new patch with improved naming: I renamed MUST_NOT to FILTER_NOT to make it clear that it is conceptually closed to FILTER than to MUST. The patch applies to 5.x and keeps backward compatibility for MUST_NOT.

        Show
        Adrien Grand added a comment - Here is a new patch with improved naming: I renamed MUST_NOT to FILTER_NOT to make it clear that it is conceptually closed to FILTER than to MUST. The patch applies to 5.x and keeps backward compatibility for MUST_NOT.
        Hide
        Robert Muir added a comment -

        Renaming of MUST_NOT is a big scope creep to this issue. I think it should be done elsewhere instead of adding any confusion to this patch. its unrelated. this patch just adds FILTER. completely unrelated to the name of MUST_NOT.

        Show
        Robert Muir added a comment - Renaming of MUST_NOT is a big scope creep to this issue. I think it should be done elsewhere instead of adding any confusion to this patch. its unrelated. this patch just adds FILTER. completely unrelated to the name of MUST_NOT.
        Hide
        Robert Muir added a comment -

        I also personally think MUST_NOT's name is just fine. I dont think adding something else, FILTER makes this confusing at all. I dont think we should rename it.

        Show
        Robert Muir added a comment - I also personally think MUST_NOT's name is just fine. I dont think adding something else, FILTER makes this confusing at all. I dont think we should rename it.
        Hide
        Adrien Grand added a comment -

        Then I'll commit the patch without renaming and open an issue issue to discuss whether we should rename our boolean clauses (if that's ok with everyone).

        Show
        Adrien Grand added a comment - Then I'll commit the patch without renaming and open an issue issue to discuss whether we should rename our boolean clauses (if that's ok with everyone).
        Hide
        Robert Muir added a comment -

        +1 to the earlier patch. Can we add some comments in BooleanWeight that 'requiredScoring' is a subset of 'required'?

        When reviewing the logic of e.g. req(), its not immediately obvious that its correct without going up hundreds of lines to look at how 'required' and 'requiredScoring' buckets are created (e.g. a MUST clause will be in both lists).

        I'm also staring at req() and trying to figure out what happens if we have only a single, non-scoring clause (yet needsScores=true for the BQ as a whole). I suppose it falls through to the BoostedScorer case? Maybe we can add a simple unit test for this case to ensure it does what its supposed to do?

        Show
        Robert Muir added a comment - +1 to the earlier patch. Can we add some comments in BooleanWeight that 'requiredScoring' is a subset of 'required'? When reviewing the logic of e.g. req(), its not immediately obvious that its correct without going up hundreds of lines to look at how 'required' and 'requiredScoring' buckets are created (e.g. a MUST clause will be in both lists). I'm also staring at req() and trying to figure out what happens if we have only a single, non-scoring clause (yet needsScores=true for the BQ as a whole). I suppose it falls through to the BoostedScorer case? Maybe we can add a simple unit test for this case to ensure it does what its supposed to do?
        Hide
        Adrien Grand added a comment -

        I'll add the suggested comments.

        I'm also staring at req() and trying to figure out what happens if we have only a single, non-scoring clause (yet needsScores=true for the BQ as a whole). I suppose it falls through to the BoostedScorer case? Maybe we can add a simple unit test for this case to ensure it does what its supposed to do?

        Exactly, it gets a BoostedScorer with a boost of 0. The tests already cover it (they create a single filter clause and make sure that scores are equal to 0), but I can make them more explicit.

        Show
        Adrien Grand added a comment - I'll add the suggested comments. I'm also staring at req() and trying to figure out what happens if we have only a single, non-scoring clause (yet needsScores=true for the BQ as a whole). I suppose it falls through to the BoostedScorer case? Maybe we can add a simple unit test for this case to ensure it does what its supposed to do? Exactly, it gets a BoostedScorer with a boost of 0. The tests already cover it (they create a single filter clause and make sure that scores are equal to 0), but I can make them more explicit.
        Hide
        Robert Muir added a comment -

        Yeah, i saw the random test, and i have no doubt its tested by that thing. But sometimes a very stupid-simple unit test is good in cases like this. It could prevent it from being broken by some boolean refactoring in the future.

        Show
        Robert Muir added a comment - Yeah, i saw the random test, and i have no doubt its tested by that thing. But sometimes a very stupid-simple unit test is good in cases like this. It could prevent it from being broken by some boolean refactoring in the future.
        Hide
        Adrien Grand added a comment -

        Patch:

        • more comments
        • added assertion that scorers is a subset of required in ConjunctionScorer
        • added unit test decicated to the single filter clause, with two cases: 1. when BooleanQuery rewrite to a single term query and 2. when Boolean cannot rewrite because of SHOULD clauses that return a null scorer.
        Show
        Adrien Grand added a comment - Patch: more comments added assertion that scorers is a subset of required in ConjunctionScorer added unit test decicated to the single filter clause, with two cases: 1. when BooleanQuery rewrite to a single term query and 2. when Boolean cannot rewrite because of SHOULD clauses that return a null scorer.
        Hide
        ASF subversion and git services added a comment -

        Commit 1658764 from Adrien Grand in branch 'dev/trunk'
        [ https://svn.apache.org/r1658764 ]

        LUCENE-6227: Add BooleanClause.Occur.FILTER.

        Show
        ASF subversion and git services added a comment - Commit 1658764 from Adrien Grand in branch 'dev/trunk' [ https://svn.apache.org/r1658764 ] LUCENE-6227 : Add BooleanClause.Occur.FILTER.
        Hide
        ASF subversion and git services added a comment -

        Commit 1658769 from Adrien Grand in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1658769 ]

        LUCENE-6227: Add BooleanClause.Occur.FILTER.

        Show
        ASF subversion and git services added a comment - Commit 1658769 from Adrien Grand in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1658769 ] LUCENE-6227 : Add BooleanClause.Occur.FILTER.
        Hide
        Paul Elschot added a comment -

        See also LUCENE-1345 and LUCENE-1518.

        Show
        Paul Elschot added a comment - See also LUCENE-1345 and LUCENE-1518 .
        Hide
        ASF subversion and git services added a comment -

        Commit 1662218 from Adrien Grand in branch 'dev/trunk'
        [ https://svn.apache.org/r1662218 ]

        LUCENE-6227: Fix explanations of FILTER clauses.

        Show
        ASF subversion and git services added a comment - Commit 1662218 from Adrien Grand in branch 'dev/trunk' [ https://svn.apache.org/r1662218 ] LUCENE-6227 : Fix explanations of FILTER clauses.
        Hide
        ASF subversion and git services added a comment -

        Commit 1662220 from Adrien Grand in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1662220 ]

        LUCENE-6227: Fix explanations of FILTER clauses.

        Show
        ASF subversion and git services added a comment - Commit 1662220 from Adrien Grand in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1662220 ] LUCENE-6227 : Fix explanations of FILTER clauses.
        Hide
        Timothy Potter added a comment -

        Bulk close after 5.1 release

        Show
        Timothy Potter added a comment - Bulk close after 5.1 release
        Hide
        yuejianjun added a comment -

        Adrien Grand

        Declarationorg.apache.lucene.search.QueryWrapperFilter
        @Deprecated
        Deprecated. You can use Query objects directly for filtering by using BooleanClause.Occur.FILTER clauses in a BooleanQuery.

        BooleanClause.Occur.FILTER is a BooleanClause.Occur.MUST of no score

        QueryWrapperFilter is a OR query
        BooleanClause.Occur.FILTER is a AND query

        why QueryWrapperFilter is replaced with the BooleanClause.Occur.FILTER ?

        Show
        yuejianjun added a comment - Adrien Grand Declarationorg.apache.lucene.search.QueryWrapperFilter @Deprecated Deprecated. You can use Query objects directly for filtering by using BooleanClause.Occur.FILTER clauses in a BooleanQuery. BooleanClause.Occur.FILTER is a BooleanClause.Occur.MUST of no score QueryWrapperFilter is a OR query BooleanClause.Occur.FILTER is a AND query why QueryWrapperFilter is replaced with the BooleanClause.Occur.FILTER ?
        Hide
        Adrien Grand added a comment -

        Maybe you can show how you are constructing queries currently so that I can help you move them to the new way of applying filters?

        Show
        Adrien Grand added a comment - Maybe you can show how you are constructing queries currently so that I can help you move them to the new way of applying filters?
        Hide
        yuejianjun added a comment -

        /**

        • filter 查询结果(Filter 过滤查询)。 filter=id:1,2
          */
          @SuppressWarnings("deprecation")
          public TermsFilter getTermsFilter(String filter) {
          try {
          if (filter != null && filter.length() > 0)
          Unknown macro: { String[] arrFilter = filter.split("}

          } catch (Exception e)

          { logger.error("SearchQuery getTermsFilter is error", e); }

          return null;
          }

        search id=1 or id=2
        how to use BooleanClause.Occur.FILTER replace ?

        Show
        yuejianjun added a comment - /** filter 查询结果(Filter 过滤查询)。 filter=id:1,2 */ @SuppressWarnings("deprecation") public TermsFilter getTermsFilter(String filter) { try { if (filter != null && filter.length() > 0) Unknown macro: { String[] arrFilter = filter.split("} } catch (Exception e) { logger.error("SearchQuery getTermsFilter is error", e); } return null; } search id=1 or id=2 how to use BooleanClause.Occur.FILTER replace ?
        Hide
        yuejianjun added a comment -

        /**

        • filter 查询结果(Filter 过滤查询)。 filter=id:1,2
          */
          @SuppressWarnings("deprecation")
          public TermsFilter getTermsFilter(String filter) {
          try {
          if (filter != null && filter.length() > 0)
          Unknown macro: { String[] arrFilter = filter.split(); String key = arrFilter[0]; String[] values = arrFilter[1].split(","); List<Term> listTerms = new ArrayList<Term>(); for (String value }

          } catch (Exception e)

          { logger.error("SearchQuery getTermsFilter is error", e); }

          return null;
          }

        search id=1 or id=2
        how to use BooleanClause.Occur.FILTER replace ?

        Show
        yuejianjun added a comment - /** filter 查询结果(Filter 过滤查询)。 filter=id:1,2 */ @SuppressWarnings("deprecation") public TermsFilter getTermsFilter(String filter) { try { if (filter != null && filter.length() > 0) Unknown macro: { String[] arrFilter = filter.split(); String key = arrFilter[0]; String[] values = arrFilter[1].split(","); List<Term> listTerms = new ArrayList<Term>(); for (String value } } catch (Exception e) { logger.error("SearchQuery getTermsFilter is error", e); } return null; } search id=1 or id=2 how to use BooleanClause.Occur.FILTER replace ?

          People

          • Assignee:
            Adrien Grand
            Reporter:
            Adrien Grand
          • Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development