Details

    • Type: Task Task
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.4, 6.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      It will still take time to completely remove Filter, but I think we should start deprecating it now to state our intention and encourage users to move to queries as soon as possible?

      1. LUCENE-6301.patch
        159 kB
        Adrien Grand
      2. LUCENE-6301.patch
        105 kB
        Adrien Grand

        Activity

        Hide
        Adrien Grand added a comment -

        My plan would be to deprecate:

        • Filter
        • FilteredQuery
        • CachingWrapperFilter

        And in these 3 cases recommend to use BooleanQuery with filter clauses instead.

        Show
        Adrien Grand added a comment - My plan would be to deprecate: Filter FilteredQuery CachingWrapperFilter And in these 3 cases recommend to use BooleanQuery with filter clauses instead.
        Hide
        Adrien Grand added a comment -

        We are getting closer: QueryWrapperFilter is now the last Filter impl in lucene/core. Filter would be hard to remove from trunk because there are lots of module that implement or consume filters, but we should be able to remove FilteredQuery by using a BooleanQuery with a FILTER clause instead. I will give it a try soon.

        Show
        Adrien Grand added a comment - We are getting closer: QueryWrapperFilter is now the last Filter impl in lucene/core. Filter would be hard to remove from trunk because there are lots of module that implement or consume filters, but we should be able to remove FilteredQuery by using a BooleanQuery with a FILTER clause instead. I will give it a try soon.
        Hide
        Adrien Grand added a comment -

        Here is a patch:

        • Filter, QueryWrapperFilter and FilteredQuery are deprecated
        • lucene/core is Filter-free (the only last occurrences are in tests to make sure the deprecated classes still work)

        The only thing that prevents us from removing FilteredQuery is lucene/spatial which has filters that only support random-access:

                @Override
                public DocIdSetIterator iterator() throws IOException {
                  throw new UnsupportedOperationException(
                      "Iteration is too slow; instead try FilteredQuery.QUERY_FIRST_FILTER_STRATEGY");
                  //Note that if you're truly bent on doing this, then see FunctionValues.getRangeScorer
                }
        

        But it should be possible to migrate it to two-phase approximations and having FilteredQuery deprecated in the meantime will help ensure no new code is using it. I don't think it should hold this issue as having filters removed from the core is already a good step towards removing filters entirely?

        Show
        Adrien Grand added a comment - Here is a patch: Filter, QueryWrapperFilter and FilteredQuery are deprecated lucene/core is Filter-free (the only last occurrences are in tests to make sure the deprecated classes still work) The only thing that prevents us from removing FilteredQuery is lucene/spatial which has filters that only support random-access: @Override public DocIdSetIterator iterator() throws IOException { throw new UnsupportedOperationException( "Iteration is too slow; instead try FilteredQuery.QUERY_FIRST_FILTER_STRATEGY"); //Note that if you're truly bent on doing this, then see FunctionValues.getRangeScorer } But it should be possible to migrate it to two-phase approximations and having FilteredQuery deprecated in the meantime will help ensure no new code is using it. I don't think it should hold this issue as having filters removed from the core is already a good step towards removing filters entirely?
        Hide
        David Smiley added a comment -

        Hi Adrien.

        But it should be possible to migrate it to two-phase approximations and having FilteredQuery deprecated in the meantime will help ensure no new code is using it. I don't think it should hold this issue as having filters removed from the core is already a good step towards removing filters entirely?

        +1. I think migrating it would be a separate JIRA issue to be created. Can it still throw an exception if you use it in such a way as to iterate? This is to prevent people from mis-using it. Do you know of a similar scorer I can look at?

        Another API change to deprecate filters is modifying the SpatialStrategy abstraction. It has a makeQuery and makeFilter pair of methods, which are effectively the same thing except for the return type. I think makeFilter should become deprecated now (as part of this issue).

        Show
        David Smiley added a comment - Hi Adrien. But it should be possible to migrate it to two-phase approximations and having FilteredQuery deprecated in the meantime will help ensure no new code is using it. I don't think it should hold this issue as having filters removed from the core is already a good step towards removing filters entirely? +1. I think migrating it would be a separate JIRA issue to be created. Can it still throw an exception if you use it in such a way as to iterate? This is to prevent people from mis-using it. Do you know of a similar scorer I can look at? Another API change to deprecate filters is modifying the SpatialStrategy abstraction. It has a makeQuery and makeFilter pair of methods, which are effectively the same thing except for the return type. I think makeFilter should become deprecated now (as part of this issue).
        Hide
        Adrien Grand added a comment -

        Can it still throw an exception if you use it in such a way as to iterate? This is to prevent people from mis-using it. Do you know of a similar scorer I can look at?

        You could do the equivalent of what these filters are currently doing by making the scorer throw an exception when calling nextDoc() or advance(). This would prevent these queries from being used directly and force consumers to use the approximation for intersections (which BooleanQuery with MUST/FILTER clauses will happily do). However beware that while the approximation will be used for advancing the iterator, score() and freq() will still be called on the scorer.

        Do you know of a similar scorer I can look at?

        DocValuesTermsQuery should be similar. It cannot advance at all so it returns an iterator that matches all documents as an approximation and its matches() impl checks doc values.

        I think makeFilter should become deprecated now (as part of this issue).

        OK, I'll have a look at it.

        Show
        Adrien Grand added a comment - Can it still throw an exception if you use it in such a way as to iterate? This is to prevent people from mis-using it. Do you know of a similar scorer I can look at? You could do the equivalent of what these filters are currently doing by making the scorer throw an exception when calling nextDoc() or advance(). This would prevent these queries from being used directly and force consumers to use the approximation for intersections (which BooleanQuery with MUST/FILTER clauses will happily do). However beware that while the approximation will be used for advancing the iterator, score() and freq() will still be called on the scorer. Do you know of a similar scorer I can look at? DocValuesTermsQuery should be similar. It cannot advance at all so it returns an iterator that matches all documents as an approximation and its matches() impl checks doc values. I think makeFilter should become deprecated now (as part of this issue). OK, I'll have a look at it.
        Hide
        Adrien Grand added a comment -

        Also I forgot to mention: the patch also fixes a bug in TermAutomatonScorer.advance. Since it was always tested with a filter with random access and a high cost, the filtered query used the filter randomly and never advanced the query.

        Show
        Adrien Grand added a comment - Also I forgot to mention: the patch also fixes a bug in TermAutomatonScorer.advance. Since it was always tested with a filter with random access and a high cost, the filtered query used the filter randomly and never advanced the query.
        Hide
        Adrien Grand added a comment -

        We made quite some progress since this issue was last updated, FilteredQuery has been removed already and Lucene now almost doesn't rely on Filter anymore (by the way, thanks to those who helped me, in particular David). Here is a patch that:

        • removes remaining usage of Filter in Lucene (mostly tests)
        • moves Filter, QueryWrapperFilter, FilteredDocIdSet and BitsFilteredDocIdSet to Solr. When backporting, this would stay in Lucene but with an "@Deprecated" annotation.

        It would be nice to make Solr not use Filter anymore as well but Filter is still an important abstraction in Solr because of how it's integrated into DocSet and SolrConstantScoreQuery. I can help here but can't do it alone. In addition, I don't think it should hold the removal of Filter from Lucene?

        Show
        Adrien Grand added a comment - We made quite some progress since this issue was last updated, FilteredQuery has been removed already and Lucene now almost doesn't rely on Filter anymore (by the way, thanks to those who helped me, in particular David). Here is a patch that: removes remaining usage of Filter in Lucene (mostly tests) moves Filter, QueryWrapperFilter, FilteredDocIdSet and BitsFilteredDocIdSet to Solr. When backporting, this would stay in Lucene but with an "@Deprecated" annotation. It would be nice to make Solr not use Filter anymore as well but Filter is still an important abstraction in Solr because of how it's integrated into DocSet and SolrConstantScoreQuery. I can help here but can't do it alone. In addition, I don't think it should hold the removal of Filter from Lucene?
        Hide
        Adrien Grand added a comment -

        I plan to commit next week if there are no objections.

        Show
        Adrien Grand added a comment - I plan to commit next week if there are no objections.
        Hide
        David Smiley added a comment -

        +1 Nice Adrien! Looks like this was a fair amount of work. I agree on moving the abstractions to Solr. Wether or not it remains in Solr shouldn't hold up this.

        Show
        David Smiley added a comment - +1 Nice Adrien! Looks like this was a fair amount of work. I agree on moving the abstractions to Solr. Wether or not it remains in Solr shouldn't hold up this.
        Hide
        ASF subversion and git services added a comment -

        Commit 1708097 from Adrien Grand in branch 'dev/trunk'
        [ https://svn.apache.org/r1708097 ]

        LUCENE-6301: Removal of org.apache.lucene.Filter.

        From a Lucene perspective Filter is gone. However it was still used for things
        like DocSet and SolrConstantScoreQuery in Solr, so it has been moved to
        the oas.search package for now, even though in the long term it would be nice
        for Solr to move to the Query API entirely as well.

        Show
        ASF subversion and git services added a comment - Commit 1708097 from Adrien Grand in branch 'dev/trunk' [ https://svn.apache.org/r1708097 ] LUCENE-6301 : Removal of org.apache.lucene.Filter. From a Lucene perspective Filter is gone. However it was still used for things like DocSet and SolrConstantScoreQuery in Solr, so it has been moved to the oas.search package for now, even though in the long term it would be nice for Solr to move to the Query API entirely as well.
        Hide
        Jack Krupansky added a comment - - edited

        I know this change has been in progress for awhile, but it just kind of sunk for me finally and now I'm wondering what the impact on Solr will be. I mean, wasn't Filter supposed to be a big performance win over a Query since it eliminates the performance impact of scoring? If that was the case, is Lucene proving some alternate method of achieving a similar performance improvement? I think it is, but... not stated quite so explicitly. An example of the expected migration would help a lot. I think the example should be in the Lucene Javadoc - "To filter documents without the performance overhead of scoring, use the following technique..." If I understand properly, one would simply wrap the query in a BooleanQuery with a single clause that uses BooleanQuery.Clause.FILTER and that would have exactly the same effect (and performance gain) as the old Filter class. Is that statement 100% accurate? If so, it would be good to make it explicit here in Jira, in the deprecation comment in the the Filter class, and in BooleanQuery as well. Thanks!

        Show
        Jack Krupansky added a comment - - edited I know this change has been in progress for awhile, but it just kind of sunk for me finally and now I'm wondering what the impact on Solr will be. I mean, wasn't Filter supposed to be a big performance win over a Query since it eliminates the performance impact of scoring? If that was the case, is Lucene proving some alternate method of achieving a similar performance improvement? I think it is, but... not stated quite so explicitly. An example of the expected migration would help a lot. I think the example should be in the Lucene Javadoc - "To filter documents without the performance overhead of scoring, use the following technique..." If I understand properly, one would simply wrap the query in a BooleanQuery with a single clause that uses BooleanQuery.Clause.FILTER and that would have exactly the same effect (and performance gain) as the old Filter class. Is that statement 100% accurate? If so, it would be good to make it explicit here in Jira, in the deprecation comment in the the Filter class, and in BooleanQuery as well. Thanks!
        Hide
        Adrien Grand added a comment -

        wasn't Filter supposed to be a big performance win over a Query since it eliminates the performance impact of scoring? If that was the case, is Lucene proving some alternate method of achieving a similar performance improvement?

        Over the past releases, we progressively improved to Query/Collector API so that queries can detect whether scores are needed and optimize in case scores are not needed in order to eg. avoid to read frequencies or stop after the first occurence is found in the case of phrase queries (LUCENE-6218). Everything is detected automatically now, for instance if you wrap a query in a ConstantScoreQuery, it will automatically notice that scores are not needed. If you sort by the value of a field and don't request scores, then again it will notice that scores are not needed and optimize query execution.

        Something else that Filters provided but not queries was random-access support. But it was a bit incomplete since Filters had no way to tell FilteredQuery if they should rather be consumed using iteration or random-access and making the wrong decision could sometimes result in super slow queries that would try to call advance() on a DocValuesRangeQuery which doesn't use an index and needs to perform a linear scan in order to locate the next match. So we added two-phase iteration support to queries (LUCENE-6198) which allows us to dissert queries into a fast approximation and a slow verification phase. For instance, a phrase query "A B" would return the conjunction (+A +B) as an approximation and check if it can find the two terms at consecutive positions as a verification phase.

        that would have exactly the same effect (and performance gain) as the old Filter class. Is that statement 100% accurate?

        If you use a query that provides an efficient approximation (such as phrase queries) as a filter, things could be considerably faster. Otherwise, things will mostly work the same way as before and you could have slight speedups or slowdowns given that we use different code paths that hotspot might optimize differently.

        I will look into the deprecation comments for Filter.

        Show
        Adrien Grand added a comment - wasn't Filter supposed to be a big performance win over a Query since it eliminates the performance impact of scoring? If that was the case, is Lucene proving some alternate method of achieving a similar performance improvement? Over the past releases, we progressively improved to Query/Collector API so that queries can detect whether scores are needed and optimize in case scores are not needed in order to eg. avoid to read frequencies or stop after the first occurence is found in the case of phrase queries ( LUCENE-6218 ). Everything is detected automatically now, for instance if you wrap a query in a ConstantScoreQuery, it will automatically notice that scores are not needed. If you sort by the value of a field and don't request scores, then again it will notice that scores are not needed and optimize query execution. Something else that Filters provided but not queries was random-access support. But it was a bit incomplete since Filters had no way to tell FilteredQuery if they should rather be consumed using iteration or random-access and making the wrong decision could sometimes result in super slow queries that would try to call advance() on a DocValuesRangeQuery which doesn't use an index and needs to perform a linear scan in order to locate the next match. So we added two-phase iteration support to queries ( LUCENE-6198 ) which allows us to dissert queries into a fast approximation and a slow verification phase. For instance, a phrase query "A B" would return the conjunction (+A +B) as an approximation and check if it can find the two terms at consecutive positions as a verification phase. that would have exactly the same effect (and performance gain) as the old Filter class. Is that statement 100% accurate? If you use a query that provides an efficient approximation (such as phrase queries) as a filter, things could be considerably faster. Otherwise, things will mostly work the same way as before and you could have slight speedups or slowdowns given that we use different code paths that hotspot might optimize differently. I will look into the deprecation comments for Filter.
        Hide
        ASF subversion and git services added a comment -

        Commit 1708121 from Adrien Grand in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1708121 ]

        LUCENE-6301: Deprecate Filter.

        Show
        ASF subversion and git services added a comment - Commit 1708121 from Adrien Grand in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1708121 ] LUCENE-6301 : Deprecate Filter.
        Hide
        Adrien Grand added a comment -

        Jack, I just backported to 5.x. Feel free to review and suggest improvements if you feel that the migration path is not clear enough.

        Show
        Adrien Grand added a comment - Jack, I just backported to 5.x. Feel free to review and suggest improvements if you feel that the migration path is not clear enough.
        Hide
        Jack Krupansky added a comment -

        Thanks! LGTM. Now let's see if the Solr guys pick up on this.

        Show
        Jack Krupansky added a comment - Thanks! LGTM. Now let's see if the Solr guys pick up on this.

          People

          • Assignee:
            Adrien Grand
            Reporter:
            Adrien Grand
          • Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development