Solr
  1. Solr
  2. SOLR-7219

Access filter cache from lucene query syntax

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.4
    • Component/s: None
    • Labels:
      None

      Description

      A filter query retrieves a set of documents matching a query from the filter cache. Since scores are not cached, all documents that match the filter produce the same score. Cached filters will be extremely fast when they are used again in another query.

      Filter Query Example:

      description:HDTV OR filter(+promotion:tv +promotion_date:[NOW/DAY-7DAYS TO NOW/DAY+1DAY])
      

      The power of the filter() syntax is that it may be used anywhere within a lucene/solr query syntax. Normal fq support is limited to top-level conjunctions.

      1. SOLR-7219.patch
        47 kB
        Yonik Seeley

        Issue Links

          Activity

          Hide
          Yonik Seeley added a comment -

          Here's the patch implementing the proposed syntax, along with tests that the filter cache is actually being hit.

          Show
          Yonik Seeley added a comment - Here's the patch implementing the proposed syntax, along with tests that the filter cache is actually being hit.
          Hide
          ASF subversion and git services added a comment -

          Commit 1694708 from Yonik Seeley in branch 'dev/trunk'
          [ https://svn.apache.org/r1694708 ]

          SOLR-7219: add filter() to query syntax

          Show
          ASF subversion and git services added a comment - Commit 1694708 from Yonik Seeley in branch 'dev/trunk' [ https://svn.apache.org/r1694708 ] SOLR-7219 : add filter() to query syntax
          Hide
          ASF subversion and git services added a comment -

          Commit 1694709 from Yonik Seeley in branch 'dev/branches/branch_5x'
          [ https://svn.apache.org/r1694709 ]

          SOLR-7219: add filter() to query syntax

          Show
          ASF subversion and git services added a comment - Commit 1694709 from Yonik Seeley in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1694709 ] SOLR-7219 : add filter() to query syntax
          Hide
          Steve Rowe added a comment -

          Frequent 5.x failures (nearly 100%?), e.g. https://builds.apache.org/job/Lucene-Solr-Tests-5.x-Java7/3405/:

             [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestSolrQueryParser -Dtests.method=testFilter -Dtests.seed=60D190157DF25DD1 -Dtests.multiplier=2 -Dtests.slow=true -Dtests.locale=no_NO -Dtests.timezone=Atlantic/Canary -Dtests.asserts=true -Dtests.file.encoding=US-ASCII
             [junit4] ERROR   0.02s J2 | TestSolrQueryParser.testFilter <<<
             [junit4]    > Throwable #1: java.lang.UnsupportedOperationException: Query SortedIntDocSetTopFilter does not implement createWeight
             [junit4]    > 	at __randomizedtesting.SeedInfo.seed([60D190157DF25DD1:A803F583E2FB7A8B]:0)
             [junit4]    > 	at org.apache.lucene.search.Query.createWeight(Query.java:79)
             [junit4]    > 	at org.apache.lucene.search.IndexSearcher.createWeight(IndexSearcher.java:855)
             [junit4]    > 	at org.apache.lucene.search.ConstantScoreQuery.createWeight(ConstantScoreQuery.java:117)
             [junit4]    > 	at org.apache.solr.query.FilterQuery.createWeight(FilterQuery.java:96)
             [junit4]    > 	at org.apache.lucene.search.IndexSearcher.createWeight(IndexSearcher.java:855)
             [junit4]    > 	at org.apache.lucene.search.BooleanWeight.<init>(BooleanWeight.java:56)
             [junit4]    > 	at org.apache.lucene.search.BooleanQuery.createWeight(BooleanQuery.java:203)
             [junit4]    > 	at org.apache.lucene.search.IndexSearcher.createWeight(IndexSearcher.java:855)
             [junit4]    > 	at org.apache.lucene.search.IndexSearcher.createNormalizedWeight(IndexSearcher.java:838)
             [junit4]    > 	at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:486)
             [junit4]    > 	at org.apache.solr.search.SolrIndexSearcher.getDocSetNC(SolrIndexSearcher.java:1259)
             [junit4]    > 	at org.apache.solr.search.SolrIndexSearcher.getPositiveDocSet(SolrIndexSearcher.java:941)
             [junit4]    > 	at org.apache.solr.search.SolrIndexSearcher.getProcessedFilter(SolrIndexSearcher.java:1103)
             [junit4]    > 	at org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1625)
             [junit4]    > 	at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1501)
             [junit4]    > 	at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:555)
             [junit4]    > 	at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:522)
             [junit4]    > 	at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:277)
             [junit4]    > 	at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
             [junit4]    > 	at org.apache.solr.core.SolrCore.execute(SolrCore.java:2068)
             [junit4]    > 	at org.apache.solr.util.TestHarness.query(TestHarness.java:320)
             [junit4]    > 	at org.apache.solr.util.TestHarness.query(TestHarness.java:302)
             [junit4]    > 	at org.apache.solr.SolrTestCaseJ4.assertJQ(SolrTestCaseJ4.java:831)
             [junit4]    > 	at org.apache.solr.SolrTestCaseJ4.assertJQ(SolrTestCaseJ4.java:800)
             [junit4]    > 	at org.apache.solr.search.TestSolrQueryParser.testFilter(TestSolrQueryParser.java:224)
             [junit4]    > 	at java.lang.Thread.run(Thread.java:745)
          
          Show
          Steve Rowe added a comment - Frequent 5.x failures (nearly 100%?), e.g. https://builds.apache.org/job/Lucene-Solr-Tests-5.x-Java7/3405/ : [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=TestSolrQueryParser -Dtests.method=testFilter -Dtests.seed=60D190157DF25DD1 -Dtests.multiplier=2 -Dtests.slow=true -Dtests.locale=no_NO -Dtests.timezone=Atlantic/Canary -Dtests.asserts=true -Dtests.file.encoding=US-ASCII [junit4] ERROR 0.02s J2 | TestSolrQueryParser.testFilter <<< [junit4] > Throwable #1: java.lang.UnsupportedOperationException: Query SortedIntDocSetTopFilter does not implement createWeight [junit4] > at __randomizedtesting.SeedInfo.seed([60D190157DF25DD1:A803F583E2FB7A8B]:0) [junit4] > at org.apache.lucene.search.Query.createWeight(Query.java:79) [junit4] > at org.apache.lucene.search.IndexSearcher.createWeight(IndexSearcher.java:855) [junit4] > at org.apache.lucene.search.ConstantScoreQuery.createWeight(ConstantScoreQuery.java:117) [junit4] > at org.apache.solr.query.FilterQuery.createWeight(FilterQuery.java:96) [junit4] > at org.apache.lucene.search.IndexSearcher.createWeight(IndexSearcher.java:855) [junit4] > at org.apache.lucene.search.BooleanWeight.<init>(BooleanWeight.java:56) [junit4] > at org.apache.lucene.search.BooleanQuery.createWeight(BooleanQuery.java:203) [junit4] > at org.apache.lucene.search.IndexSearcher.createWeight(IndexSearcher.java:855) [junit4] > at org.apache.lucene.search.IndexSearcher.createNormalizedWeight(IndexSearcher.java:838) [junit4] > at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:486) [junit4] > at org.apache.solr.search.SolrIndexSearcher.getDocSetNC(SolrIndexSearcher.java:1259) [junit4] > at org.apache.solr.search.SolrIndexSearcher.getPositiveDocSet(SolrIndexSearcher.java:941) [junit4] > at org.apache.solr.search.SolrIndexSearcher.getProcessedFilter(SolrIndexSearcher.java:1103) [junit4] > at org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1625) [junit4] > at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1501) [junit4] > at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:555) [junit4] > at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:522) [junit4] > at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:277) [junit4] > at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143) [junit4] > at org.apache.solr.core.SolrCore.execute(SolrCore.java:2068) [junit4] > at org.apache.solr.util.TestHarness.query(TestHarness.java:320) [junit4] > at org.apache.solr.util.TestHarness.query(TestHarness.java:302) [junit4] > at org.apache.solr.SolrTestCaseJ4.assertJQ(SolrTestCaseJ4.java:831) [junit4] > at org.apache.solr.SolrTestCaseJ4.assertJQ(SolrTestCaseJ4.java:800) [junit4] > at org.apache.solr.search.TestSolrQueryParser.testFilter(TestSolrQueryParser.java:224) [junit4] > at java.lang.Thread.run(Thread.java:745)
          Hide
          Yonik Seeley added a comment -

          Interesting... some key difference between 5x and trunk. Looking into it now.

          Show
          Yonik Seeley added a comment - Interesting... some key difference between 5x and trunk. Looking into it now.
          Hide
          ASF subversion and git services added a comment -

          Commit 1694807 from Yonik Seeley in branch 'dev/branches/branch_5x'
          [ https://svn.apache.org/r1694807 ]

          SOLR-7219: temporary disable failing test

          Show
          ASF subversion and git services added a comment - Commit 1694807 from Yonik Seeley in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1694807 ] SOLR-7219 : temporary disable failing test
          Hide
          Yonik Seeley added a comment -

          I've temporarily disabled the test. At first glance it looks like Lucene's Filter->Query transition introduced a back-compat break.

          Show
          Yonik Seeley added a comment - I've temporarily disabled the test. At first glance it looks like Lucene's Filter->Query transition introduced a back-compat break.
          Hide
          David Smiley added a comment -

          Just curious; why is this feature implemented as a change to the query grammar versus adding a QParser? I suppose the grammar change might be nicer looking – which is subjective. Another approach could have been a special local param (e.g. cache=true) on top of any QParser.

          Show
          David Smiley added a comment - Just curious; why is this feature implemented as a change to the query grammar versus adding a QParser? I suppose the grammar change might be nicer looking – which is subjective. Another approach could have been a special local param (e.g. cache=true) on top of any QParser.
          Hide
          Erik Hatcher added a comment -

          Re: David Smiley's comment, I do think we shouldn't call this the "lucene" query parser any more. Maybe it should be renamed/aliased to "solr"?

          Show
          Erik Hatcher added a comment - Re: David Smiley 's comment, I do think we shouldn't call this the "lucene" query parser any more. Maybe it should be renamed/aliased to "solr"?
          Hide
          Yonik Seeley added a comment -

          Just curious; why is this feature implemented as a change to the query grammar versus adding a QParser?

          The feature was about being able to access the filter cache anywhere within a lucene/solr query string.
          Since qparsers can't encapsulate arbitrary query clauses in lucene syntax, one would need to use additional params.

          foo:bar filter(instock:true)
          vs
          foo:bar {!filter v=$myfilt}&myfilt=instock:true
          

          So possible with just a qparser, but not as easy or nice looking.

          Show
          Yonik Seeley added a comment - Just curious; why is this feature implemented as a change to the query grammar versus adding a QParser? The feature was about being able to access the filter cache anywhere within a lucene/solr query string. Since qparsers can't encapsulate arbitrary query clauses in lucene syntax, one would need to use additional params. foo:bar filter(instock: true ) vs foo:bar {!filter v=$myfilt}&myfilt=instock: true So possible with just a qparser, but not as easy or nice looking.
          Hide
          ASF subversion and git services added a comment -

          Commit 1695133 from Yonik Seeley in branch 'dev/trunk'
          [ https://svn.apache.org/r1695133 ]

          SOLR-7219: use SolrConstantScoreQuery to fix 5x filter() break

          Show
          ASF subversion and git services added a comment - Commit 1695133 from Yonik Seeley in branch 'dev/trunk' [ https://svn.apache.org/r1695133 ] SOLR-7219 : use SolrConstantScoreQuery to fix 5x filter() break
          Hide
          ASF subversion and git services added a comment -

          Commit 1695135 from Yonik Seeley in branch 'dev/branches/branch_5x'
          [ https://svn.apache.org/r1695135 ]

          SOLR-7219: use SolrConstantScoreQuery to fix 5x filter() break

          Show
          ASF subversion and git services added a comment - Commit 1695135 from Yonik Seeley in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1695135 ] SOLR-7219 : use SolrConstantScoreQuery to fix 5x filter() break
          Hide
          ASF subversion and git services added a comment -

          Commit 1695136 from Yonik Seeley in branch 'dev/branches/branch_5x'
          [ https://svn.apache.org/r1695136 ]

          SOLR-7219: re-enable filter test

          Show
          ASF subversion and git services added a comment - Commit 1695136 from Yonik Seeley in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1695136 ] SOLR-7219 : re-enable filter test
          Hide
          Yonik Seeley added a comment -

          Definitely seems like a back compat break in the lucene Filter API, but for now I just switched from ConstantScoreQuery to SolrConstantScoreQuery to fix this issue.

          Show
          Yonik Seeley added a comment - Definitely seems like a back compat break in the lucene Filter API, but for now I just switched from ConstantScoreQuery to SolrConstantScoreQuery to fix this issue.
          Hide
          Alexandre Rafalovitch added a comment -

          Does this allow to OR the filter queries efficiently? That was always the limitation of *fq* that you could only effectively AND them.

          If it does, we should document that rather prominently to make people happy.

          Show
          Alexandre Rafalovitch added a comment - Does this allow to OR the filter queries efficiently? That was always the limitation of * fq * that you could only effectively AND them. If it does, we should document that rather prominently to make people happy.
          Hide
          Yonik Seeley added a comment -

          Does this allow to OR the filter queries efficiently?

          Yep,
          fq=filter(foo) filter(bar) filter(baz)

          Although we could prob do it more efficiently in the future. The current solution will use lucene's disjunction scorer, but we can get more efficient than that if we know we are dealing with DocSets.

          Show
          Yonik Seeley added a comment - Does this allow to OR the filter queries efficiently? Yep, fq=filter(foo) filter(bar) filter(baz) Although we could prob do it more efficiently in the future. The current solution will use lucene's disjunction scorer, but we can get more efficient than that if we know we are dealing with DocSets.
          Hide
          Alexandre Rafalovitch added a comment -

          Could we add this to the Changes file or somewhere else as an example then. It is both super cool and completely non-intuitive.

          Show
          Alexandre Rafalovitch added a comment - Could we add this to the Changes file or somewhere else as an example then. It is both super cool and completely non-intuitive.
          Hide
          Alessandro Benedetti added a comment -

          I was wondering if it is really necessary to add the "filter" element in the syntax when we want to re-use the filter.

          Isn't always true we would ideally re-use all our boolean clauses ?
          Of course doing this automatically could produce a very fast changing filterCache.
          But probably this should be completely trasparent to the user.
          I got used to the concept of filter queries, but actually more the time passes more I see the opportunity in having an automatic caching for all the boolean clauses in the main query block.
          What could be cons of that approach ?

          Show
          Alessandro Benedetti added a comment - I was wondering if it is really necessary to add the "filter" element in the syntax when we want to re-use the filter. Isn't always true we would ideally re-use all our boolean clauses ? Of course doing this automatically could produce a very fast changing filterCache. But probably this should be completely trasparent to the user. I got used to the concept of filter queries, but actually more the time passes more I see the opportunity in having an automatic caching for all the boolean clauses in the main query block. What could be cons of that approach ?

            People

            • Assignee:
              Yonik Seeley
              Reporter:
              Yonik Seeley
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development