Solr
  1. Solr
  2. SOLR-7219

Access filter cache from lucene query syntax

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.4
    • Component/s: None
    • Labels:
      None

      Description

      A filter query retrieves a set of documents matching a query from the filter cache. Since scores are not cached, all documents that match the filter produce the same score. Cached filters will be extremely fast when they are used again in another query.

      Filter Query Example:

      description:HDTV OR filter(+promotion:tv +promotion_date:[NOW/DAY-7DAYS TO NOW/DAY+1DAY])
      

      The power of the filter() syntax is that it may be used anywhere within a lucene/solr query syntax. Normal fq support is limited to top-level conjunctions.

      1. SOLR-7219.patch
        47 kB
        Yonik Seeley

        Activity

        Hide
        Yonik Seeley added a comment -

        Here's the patch implementing the proposed syntax, along with tests that the filter cache is actually being hit.

        Show
        Yonik Seeley added a comment - Here's the patch implementing the proposed syntax, along with tests that the filter cache is actually being hit.
        Hide
        ASF subversion and git services added a comment -

        Commit 1694708 from Yonik Seeley in branch 'dev/trunk'
        [ https://svn.apache.org/r1694708 ]

        SOLR-7219: add filter() to query syntax

        Show
        ASF subversion and git services added a comment - Commit 1694708 from Yonik Seeley in branch 'dev/trunk' [ https://svn.apache.org/r1694708 ] SOLR-7219 : add filter() to query syntax
        Hide
        ASF subversion and git services added a comment -

        Commit 1694709 from Yonik Seeley in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1694709 ]

        SOLR-7219: add filter() to query syntax

        Show
        ASF subversion and git services added a comment - Commit 1694709 from Yonik Seeley in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1694709 ] SOLR-7219 : add filter() to query syntax
        Hide
        Steve Rowe added a comment -

        Frequent 5.x failures (nearly 100%?), e.g. https://builds.apache.org/job/Lucene-Solr-Tests-5.x-Java7/3405/:

           [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestSolrQueryParser -Dtests.method=testFilter -Dtests.seed=60D190157DF25DD1 -Dtests.multiplier=2 -Dtests.slow=true -Dtests.locale=no_NO -Dtests.timezone=Atlantic/Canary -Dtests.asserts=true -Dtests.file.encoding=US-ASCII
           [junit4] ERROR   0.02s J2 | TestSolrQueryParser.testFilter <<<
           [junit4]    > Throwable #1: java.lang.UnsupportedOperationException: Query SortedIntDocSetTopFilter does not implement createWeight
           [junit4]    > 	at __randomizedtesting.SeedInfo.seed([60D190157DF25DD1:A803F583E2FB7A8B]:0)
           [junit4]    > 	at org.apache.lucene.search.Query.createWeight(Query.java:79)
           [junit4]    > 	at org.apache.lucene.search.IndexSearcher.createWeight(IndexSearcher.java:855)
           [junit4]    > 	at org.apache.lucene.search.ConstantScoreQuery.createWeight(ConstantScoreQuery.java:117)
           [junit4]    > 	at org.apache.solr.query.FilterQuery.createWeight(FilterQuery.java:96)
           [junit4]    > 	at org.apache.lucene.search.IndexSearcher.createWeight(IndexSearcher.java:855)
           [junit4]    > 	at org.apache.lucene.search.BooleanWeight.<init>(BooleanWeight.java:56)
           [junit4]    > 	at org.apache.lucene.search.BooleanQuery.createWeight(BooleanQuery.java:203)
           [junit4]    > 	at org.apache.lucene.search.IndexSearcher.createWeight(IndexSearcher.java:855)
           [junit4]    > 	at org.apache.lucene.search.IndexSearcher.createNormalizedWeight(IndexSearcher.java:838)
           [junit4]    > 	at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:486)
           [junit4]    > 	at org.apache.solr.search.SolrIndexSearcher.getDocSetNC(SolrIndexSearcher.java:1259)
           [junit4]    > 	at org.apache.solr.search.SolrIndexSearcher.getPositiveDocSet(SolrIndexSearcher.java:941)
           [junit4]    > 	at org.apache.solr.search.SolrIndexSearcher.getProcessedFilter(SolrIndexSearcher.java:1103)
           [junit4]    > 	at org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1625)
           [junit4]    > 	at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1501)
           [junit4]    > 	at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:555)
           [junit4]    > 	at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:522)
           [junit4]    > 	at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:277)
           [junit4]    > 	at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
           [junit4]    > 	at org.apache.solr.core.SolrCore.execute(SolrCore.java:2068)
           [junit4]    > 	at org.apache.solr.util.TestHarness.query(TestHarness.java:320)
           [junit4]    > 	at org.apache.solr.util.TestHarness.query(TestHarness.java:302)
           [junit4]    > 	at org.apache.solr.SolrTestCaseJ4.assertJQ(SolrTestCaseJ4.java:831)
           [junit4]    > 	at org.apache.solr.SolrTestCaseJ4.assertJQ(SolrTestCaseJ4.java:800)
           [junit4]    > 	at org.apache.solr.search.TestSolrQueryParser.testFilter(TestSolrQueryParser.java:224)
           [junit4]    > 	at java.lang.Thread.run(Thread.java:745)
        
        Show
        Steve Rowe added a comment - Frequent 5.x failures (nearly 100%?), e.g. https://builds.apache.org/job/Lucene-Solr-Tests-5.x-Java7/3405/ : [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=TestSolrQueryParser -Dtests.method=testFilter -Dtests.seed=60D190157DF25DD1 -Dtests.multiplier=2 -Dtests.slow=true -Dtests.locale=no_NO -Dtests.timezone=Atlantic/Canary -Dtests.asserts=true -Dtests.file.encoding=US-ASCII [junit4] ERROR 0.02s J2 | TestSolrQueryParser.testFilter <<< [junit4] > Throwable #1: java.lang.UnsupportedOperationException: Query SortedIntDocSetTopFilter does not implement createWeight [junit4] > at __randomizedtesting.SeedInfo.seed([60D190157DF25DD1:A803F583E2FB7A8B]:0) [junit4] > at org.apache.lucene.search.Query.createWeight(Query.java:79) [junit4] > at org.apache.lucene.search.IndexSearcher.createWeight(IndexSearcher.java:855) [junit4] > at org.apache.lucene.search.ConstantScoreQuery.createWeight(ConstantScoreQuery.java:117) [junit4] > at org.apache.solr.query.FilterQuery.createWeight(FilterQuery.java:96) [junit4] > at org.apache.lucene.search.IndexSearcher.createWeight(IndexSearcher.java:855) [junit4] > at org.apache.lucene.search.BooleanWeight.<init>(BooleanWeight.java:56) [junit4] > at org.apache.lucene.search.BooleanQuery.createWeight(BooleanQuery.java:203) [junit4] > at org.apache.lucene.search.IndexSearcher.createWeight(IndexSearcher.java:855) [junit4] > at org.apache.lucene.search.IndexSearcher.createNormalizedWeight(IndexSearcher.java:838) [junit4] > at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:486) [junit4] > at org.apache.solr.search.SolrIndexSearcher.getDocSetNC(SolrIndexSearcher.java:1259) [junit4] > at org.apache.solr.search.SolrIndexSearcher.getPositiveDocSet(SolrIndexSearcher.java:941) [junit4] > at org.apache.solr.search.SolrIndexSearcher.getProcessedFilter(SolrIndexSearcher.java:1103) [junit4] > at org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1625) [junit4] > at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1501) [junit4] > at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:555) [junit4] > at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:522) [junit4] > at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:277) [junit4] > at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143) [junit4] > at org.apache.solr.core.SolrCore.execute(SolrCore.java:2068) [junit4] > at org.apache.solr.util.TestHarness.query(TestHarness.java:320) [junit4] > at org.apache.solr.util.TestHarness.query(TestHarness.java:302) [junit4] > at org.apache.solr.SolrTestCaseJ4.assertJQ(SolrTestCaseJ4.java:831) [junit4] > at org.apache.solr.SolrTestCaseJ4.assertJQ(SolrTestCaseJ4.java:800) [junit4] > at org.apache.solr.search.TestSolrQueryParser.testFilter(TestSolrQueryParser.java:224) [junit4] > at java.lang.Thread.run(Thread.java:745)
        Hide
        Yonik Seeley added a comment -

        Interesting... some key difference between 5x and trunk. Looking into it now.

        Show
        Yonik Seeley added a comment - Interesting... some key difference between 5x and trunk. Looking into it now.
        Hide
        ASF subversion and git services added a comment -

        Commit 1694807 from Yonik Seeley in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1694807 ]

        SOLR-7219: temporary disable failing test

        Show
        ASF subversion and git services added a comment - Commit 1694807 from Yonik Seeley in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1694807 ] SOLR-7219 : temporary disable failing test
        Hide
        Yonik Seeley added a comment -

        I've temporarily disabled the test. At first glance it looks like Lucene's Filter->Query transition introduced a back-compat break.

        Show
        Yonik Seeley added a comment - I've temporarily disabled the test. At first glance it looks like Lucene's Filter->Query transition introduced a back-compat break.
        Hide
        David Smiley added a comment -

        Just curious; why is this feature implemented as a change to the query grammar versus adding a QParser? I suppose the grammar change might be nicer looking – which is subjective. Another approach could have been a special local param (e.g. cache=true) on top of any QParser.

        Show
        David Smiley added a comment - Just curious; why is this feature implemented as a change to the query grammar versus adding a QParser? I suppose the grammar change might be nicer looking – which is subjective. Another approach could have been a special local param (e.g. cache=true) on top of any QParser.
        Hide
        Erik Hatcher added a comment -

        Re: David Smiley's comment, I do think we shouldn't call this the "lucene" query parser any more. Maybe it should be renamed/aliased to "solr"?

        Show
        Erik Hatcher added a comment - Re: David Smiley 's comment, I do think we shouldn't call this the "lucene" query parser any more. Maybe it should be renamed/aliased to "solr"?
        Hide
        Yonik Seeley added a comment -

        Just curious; why is this feature implemented as a change to the query grammar versus adding a QParser?

        The feature was about being able to access the filter cache anywhere within a lucene/solr query string.
        Since qparsers can't encapsulate arbitrary query clauses in lucene syntax, one would need to use additional params.

        foo:bar filter(instock:true)
        vs
        foo:bar {!filter v=$myfilt}&myfilt=instock:true
        

        So possible with just a qparser, but not as easy or nice looking.

        Show
        Yonik Seeley added a comment - Just curious; why is this feature implemented as a change to the query grammar versus adding a QParser? The feature was about being able to access the filter cache anywhere within a lucene/solr query string. Since qparsers can't encapsulate arbitrary query clauses in lucene syntax, one would need to use additional params. foo:bar filter(instock: true ) vs foo:bar {!filter v=$myfilt}&myfilt=instock: true So possible with just a qparser, but not as easy or nice looking.
        Hide
        ASF subversion and git services added a comment -

        Commit 1695133 from Yonik Seeley in branch 'dev/trunk'
        [ https://svn.apache.org/r1695133 ]

        SOLR-7219: use SolrConstantScoreQuery to fix 5x filter() break

        Show
        ASF subversion and git services added a comment - Commit 1695133 from Yonik Seeley in branch 'dev/trunk' [ https://svn.apache.org/r1695133 ] SOLR-7219 : use SolrConstantScoreQuery to fix 5x filter() break
        Hide
        ASF subversion and git services added a comment -

        Commit 1695135 from Yonik Seeley in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1695135 ]

        SOLR-7219: use SolrConstantScoreQuery to fix 5x filter() break

        Show
        ASF subversion and git services added a comment - Commit 1695135 from Yonik Seeley in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1695135 ] SOLR-7219 : use SolrConstantScoreQuery to fix 5x filter() break
        Hide
        ASF subversion and git services added a comment -

        Commit 1695136 from Yonik Seeley in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1695136 ]

        SOLR-7219: re-enable filter test

        Show
        ASF subversion and git services added a comment - Commit 1695136 from Yonik Seeley in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1695136 ] SOLR-7219 : re-enable filter test
        Hide
        Yonik Seeley added a comment -

        Definitely seems like a back compat break in the lucene Filter API, but for now I just switched from ConstantScoreQuery to SolrConstantScoreQuery to fix this issue.

        Show
        Yonik Seeley added a comment - Definitely seems like a back compat break in the lucene Filter API, but for now I just switched from ConstantScoreQuery to SolrConstantScoreQuery to fix this issue.
        Hide
        Alexandre Rafalovitch added a comment -

        Does this allow to OR the filter queries efficiently? That was always the limitation of *fq* that you could only effectively AND them.

        If it does, we should document that rather prominently to make people happy.

        Show
        Alexandre Rafalovitch added a comment - Does this allow to OR the filter queries efficiently? That was always the limitation of * fq * that you could only effectively AND them. If it does, we should document that rather prominently to make people happy.
        Hide
        Yonik Seeley added a comment -

        Does this allow to OR the filter queries efficiently?

        Yep,
        fq=filter(foo) filter(bar) filter(baz)

        Although we could prob do it more efficiently in the future. The current solution will use lucene's disjunction scorer, but we can get more efficient than that if we know we are dealing with DocSets.

        Show
        Yonik Seeley added a comment - Does this allow to OR the filter queries efficiently? Yep, fq=filter(foo) filter(bar) filter(baz) Although we could prob do it more efficiently in the future. The current solution will use lucene's disjunction scorer, but we can get more efficient than that if we know we are dealing with DocSets.
        Hide
        Alexandre Rafalovitch added a comment -

        Could we add this to the Changes file or somewhere else as an example then. It is both super cool and completely non-intuitive.

        Show
        Alexandre Rafalovitch added a comment - Could we add this to the Changes file or somewhere else as an example then. It is both super cool and completely non-intuitive.
        Hide
        Alessandro Benedetti added a comment -

        I was wondering if it is really necessary to add the "filter" element in the syntax when we want to re-use the filter.

        Isn't always true we would ideally re-use all our boolean clauses ?
        Of course doing this automatically could produce a very fast changing filterCache.
        But probably this should be completely trasparent to the user.
        I got used to the concept of filter queries, but actually more the time passes more I see the opportunity in having an automatic caching for all the boolean clauses in the main query block.
        What could be cons of that approach ?

        Show
        Alessandro Benedetti added a comment - I was wondering if it is really necessary to add the "filter" element in the syntax when we want to re-use the filter. Isn't always true we would ideally re-use all our boolean clauses ? Of course doing this automatically could produce a very fast changing filterCache. But probably this should be completely trasparent to the user. I got used to the concept of filter queries, but actually more the time passes more I see the opportunity in having an automatic caching for all the boolean clauses in the main query block. What could be cons of that approach ?

          People

          • Assignee:
            Yonik Seeley
            Reporter:
            Yonik Seeley
          • Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development