Solr
  1. Solr
  2. SOLR-1223

Query Filter fq with OR operator

    Details

    • Type: New Feature New Feature
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: search
    • Labels:
      None

      Description

      See this issue for some background. Today, all of the Query filters specified with the fq parameter are AND'd together.

      This issue is about allowing a set of filters to be OR'd together (in addition to having another set of filters that are AND'd). The OR'd filters would of course be applied before any scoring is done.

      The advantage of this feature is that you will be able to break up complex filters into simple, more cacheable filters, which should improve performance.

        Issue Links

          Activity

          Hide
          Mikhail Khludnev added a comment -

          you can check the workaround mentioned in "OR Filters" in http://blog.griddynamics.com/2014/01/segmented-filter-cache-in-solr.html

          Show
          Mikhail Khludnev added a comment - you can check the workaround mentioned in "OR Filters" in http://blog.griddynamics.com/2014/01/segmented-filter-cache-in-solr.html
          Hide
          Manuel Lenormand added a comment -

          For simplification let's assume we have all the clauses cached as filterQueries, namely bitsSets. Let's say they were warmed in a new-searcher. All we need now is creating a new bitSet merged out of the logical OR/AND between the cached queryFilters.

          Looking at QueryComponent.prepare (line 167) we add a new queryParsed filter to the responseBuilder, so before this stage we need to retrieve the cachedFilters (from SolrIndexSearcher) and put a new cachedFilter out of the merged bitSet.

          Show
          Manuel Lenormand added a comment - For simplification let's assume we have all the clauses cached as filterQueries, namely bitsSets. Let's say they were warmed in a new-searcher. All we need now is creating a new bitSet merged out of the logical OR/AND between the cached queryFilters. Looking at QueryComponent.prepare (line 167) we add a new queryParsed filter to the responseBuilder, so before this stage we need to retrieve the cachedFilters (from SolrIndexSearcher) and put a new cachedFilter out of the merged bitSet.
          Hide
          Ron Buchanan added a comment -

          If you care for input from a nobody that's fairly new to Solr, I like Hoss Man's idea - and I very, very much want this

          Though my thought was that it would make sense to use the "v=$paramName" facility and just add multiple instances of "paramName"

          Show
          Ron Buchanan added a comment - If you care for input from a nobody that's fairly new to Solr, I like Hoss Man's idea - and I very, very much want this Though my thought was that it would make sense to use the "v=$paramName" facility and just add multiple instances of "paramName"
          Hide
          Hoss Man added a comment -

          Ugh ... undo Jira's new ridiculous "you hit a key while focus was on your browser so we're going to edit the text undre your mouse behavior"

          Show
          Hoss Man added a comment - Ugh ... undo Jira's new ridiculous "you hit a key while focus was on your browser so we're going to edit the text undre your mouse behavior"
          Hide
          Hoss Man added a comment -

          I had an idea about this a while back that i thought i posted as a comment but apparently i forgot.

          what if we implemented a very simple "or" (and for parity: "and") QParser that took as it's input the name of another (multivalued) param, which it then iterated over, parsing each one according to the (possibly local) defType and built up a BooleanQuery containing all of the resulting clauses. ie...

          ...
          &fq={!or tag=model}models
          &models=model:ford
          &models=model:toyota
          &models=model:gm
          &fq={!or tag=color defType=field f=color}colors
          &colors=Cherry Red
          &colors=Green
          ...
          

          As alluded to in this example, it would greatly simplify the input handling needed when doing multi-select faceting

          A possible later optimization would be for this OrQParser to actually return a subclass of BooleanQuery implementing a special interface so that the SOlrIndexSearcher would know if/when it could pull out the BooleanClauses to cache them individually.

          (The one concern i have about this idea is the unusualness of the input to this parser being the name of another (multivalued) param – people might be confused with the more common convention of passing as input a dereferenced param – ie: '$param'))

          Show
          Hoss Man added a comment - I had an idea about this a while back that i thought i posted as a comment but apparently i forgot. what if we implemented a very simple "or" (and for parity: "and") QParser that took as it's input the name of another (multivalued) param, which it then iterated over, parsing each one according to the (possibly local) defType and built up a BooleanQuery containing all of the resulting clauses. ie... ... &fq={!or tag=model}models &models=model:ford &models=model:toyota &models=model:gm &fq={!or tag=color defType=field f=color}colors &colors=Cherry Red &colors=Green ... As alluded to in this example, it would greatly simplify the input handling needed when doing multi-select faceting A possible later optimization would be for this OrQParser to actually return a subclass of BooleanQuery implementing a special interface so that the SOlrIndexSearcher would know if/when it could pull out the BooleanClauses to cache them individually. (The one concern i have about this idea is the unusualness of the input to this parser being the name of another (multivalued) param – people might be confused with the more common convention of passing as input a dereferenced param – ie: ' $param '))
          Hide
          Jan Høydahl added a comment -

          Brian, if you'd like to get the ball rolling, one way is do an initial patch, and then hope for involvement.
          Another way is to hire someone (see http://wiki.apache.org/solr/Support) for the programming.

          Personally I like the Lance's approach with individually cached sub-filters referenced by variables. Perhaps using $ syntax as we do for query substitution? fq=($fq1 AND $fq2) OR ($fq3 & $fq4).

          Show
          Jan Høydahl added a comment - Brian, if you'd like to get the ball rolling, one way is do an initial patch, and then hope for involvement. Another way is to hire someone (see http://wiki.apache.org/solr/Support ) for the programming. Personally I like the Lance's approach with individually cached sub-filters referenced by variables. Perhaps using $ syntax as we do for query substitution? fq=($fq1 AND $fq2) OR ($fq3 & $fq4).
          Hide
          Brian Pearson added a comment -

          Any chance this will make it into Solr 4? I see there are 8 votes now, is that enough to bump the priority?

          Show
          Brian Pearson added a comment - Any chance this will make it into Solr 4? I see there are 8 votes now, is that enough to bump the priority?
          Hide
          Shawn Heisey added a comment -

          I'd rather see a new filterQuery type like ofq than being stuck with the current options. Nested filterQueries including variables would obviously be the most flexible solution, but imho having two different filter types would add enough benefit in the meantime.

          I see that someone else had the same idea a long time before I did. I just brought this up on the solr-user list a few days ago, but I couldn't think of a good parameter name. The parameter name I came up with (fqu, filter query union) is not as good as ofq.

          I like Brian and Frederik's idea.

          Show
          Shawn Heisey added a comment - I'd rather see a new filterQuery type like ofq than being stuck with the current options. Nested filterQueries including variables would obviously be the most flexible solution, but imho having two different filter types would add enough benefit in the meantime. I see that someone else had the same idea a long time before I did. I just brought this up on the solr-user list a few days ago, but I couldn't think of a good parameter name. The parameter name I came up with (fqu, filter query union) is not as good as ofq. I like Brian and Frederik's idea.
          Hide
          Frederik Kraus added a comment -

          I'd rather see a new filterQuery type like ofq than being stuck with the current options. Nested filterQueries including variables would obviously be the most flexible solution, but imho having two different filter types would add enough benefit in the meantime. Just my 2 cent ...

          Show
          Frederik Kraus added a comment - I'd rather see a new filterQuery type like ofq than being stuck with the current options. Nested filterQueries including variables would obviously be the most flexible solution, but imho having two different filter types would add enough benefit in the meantime. Just my 2 cent ...
          Hide
          Brian Pearson added a comment -

          Hey folks,

          As I've mentioned above, this issue is very important to me, or rather to my employer. If there was a way to accelerate the development, we'd be interested to talk about it. Perhaps a substantial donation, or some other deal could be worked out. Of course, the feature would need to be implemented properly with the intention of integrating it into the main codeline. I think it would be a valuable addition to Solr especially for anyone doing complex authorization, not to mention the other uses and performance benefits this would have.

          Thanks all

          Show
          Brian Pearson added a comment - Hey folks, As I've mentioned above, this issue is very important to me, or rather to my employer. If there was a way to accelerate the development, we'd be interested to talk about it. Perhaps a substantial donation, or some other deal could be worked out. Of course, the feature would need to be implemented properly with the intention of integrating it into the main codeline. I think it would be a valuable addition to Solr especially for anyone doing complex authorization, not to mention the other uses and performance benefits this would have. Thanks all
          Hide
          Brian Pearson added a comment -

          Hey Lance, that solution would be perfect and certainly more powerful than what I suggested. Again, if anyone takes this up, I can help with testing and docs

          Show
          Brian Pearson added a comment - Hey Lance, that solution would be perfect and certainly more powerful than what I suggested. Again, if anyone takes this up, I can help with testing and docs
          Hide
          Lance Norskog added a comment -

          The problem with BP's suggestion is that (fq1 & fq2) OR (fq3 & fq4) is not possible. One ends up wanting to do everything with NAND. Or postfix notation.

          To have a tree-structured AND/OR/NOT expression we need to name the individual filter queries and then reference them in a tree.

          fq1=query
          fq2=query
          fq3=query
          fq4=query

          fq=(fq1 AND fq2) OR (fq3 & fq4)

          This can be done without naming filters if we just have something that finds filter queries as subtrees in a given query. If "field:3" is a cached filter query, this tree-walker would find that cached filter in fq="+a +field:3". It would then do a search of "+a" applying the existing filter "field:3".

          I vote for named filters as the most transparent system for dynamically compositing filters.

          Show
          Lance Norskog added a comment - The problem with BP's suggestion is that (fq1 & fq2) OR (fq3 & fq4) is not possible. One ends up wanting to do everything with NAND. Or postfix notation. To have a tree-structured AND/OR/NOT expression we need to name the individual filter queries and then reference them in a tree. fq1=query fq2=query fq3=query fq4=query fq=(fq1 AND fq2) OR (fq3 & fq4) This can be done without naming filters if we just have something that finds filter queries as subtrees in a given query. If "field:3" is a cached filter query, this tree-walker would find that cached filter in fq="+a +field:3". It would then do a search of "+a" applying the existing filter "field:3". I vote for named filters as the most transparent system for dynamically compositing filters.
          Hide
          Brian Pearson added a comment -

          Just putting my thoughts out .. obviously don't have the same understanding as Yonik

          Feels like specifying which filter's are cached doesn't need to be part of this issue. This issue could just be about specifying the boolean logic that gets applied to the Doc Maps that are created by the filters. So today you have a number of filters .. F1, F2, ... FN and the boolean logic is F1 & F2 & .. &FN

          What I'm hoping for is something simple where you have 2 groups, the AND group and the OR group.
          AND group: A1, A2
          OR group: O1, O2

          So the filter you end up with is (A1 & A2) & (O1 | O2). The caching logic doesn't need to change then, there are still 4 filter queries.

          Real example:

          Current method
          fq=popularity:[10 TO *] OR section:0
          fq=type:2

          New method (assuming we added ofq for the OR'd filter query group, probably there is a better way to make the API )
          ofq=popularity:[10 TO *]
          ofq=section:0
          fq=type:2

          I realize I'm probably missing some important implementation details that Yonik alludes to, just wanted to get my thoughts down. I wish I was in a position to actually work on this .. if someone takes this on, I can help with beta testing and documentation.

          If anyone cares, the reason I want this is because my apps have extremely complicated authorization logic. I can do what I need using fq's, but the filters get large and are very specific to the user that did the search. If I had OR logic, then I could break the filters down into smaller pieces, which would be much more reusable from the cache, and performance would be much better

          Thanks for listening

          Show
          Brian Pearson added a comment - Just putting my thoughts out .. obviously don't have the same understanding as Yonik Feels like specifying which filter's are cached doesn't need to be part of this issue. This issue could just be about specifying the boolean logic that gets applied to the Doc Maps that are created by the filters. So today you have a number of filters .. F1, F2, ... FN and the boolean logic is F1 & F2 & .. &FN What I'm hoping for is something simple where you have 2 groups, the AND group and the OR group. AND group: A1, A2 OR group: O1, O2 So the filter you end up with is (A1 & A2) & (O1 | O2). The caching logic doesn't need to change then, there are still 4 filter queries. Real example: Current method fq=popularity: [10 TO *] OR section:0 fq=type:2 New method (assuming we added ofq for the OR'd filter query group, probably there is a better way to make the API ) ofq=popularity: [10 TO *] ofq=section:0 fq=type:2 I realize I'm probably missing some important implementation details that Yonik alludes to, just wanted to get my thoughts down. I wish I was in a position to actually work on this .. if someone takes this on, I can help with beta testing and documentation. If anyone cares, the reason I want this is because my apps have extremely complicated authorization logic. I can do what I need using fq's, but the filters get large and are very specific to the user that did the search. If I had OR logic, then I could break the filters down into smaller pieces, which would be much more reusable from the cache, and performance would be much better Thanks for listening
          Hide
          Yonik Seeley added a comment -

          The hardest part to this is passing the information through the Solr APIs to tell Solr to cache the clauses of a particular boolean query separately.

          2 approaches:

          • a SolrQuery that wrapps a normal Query and adds extra metadata like "cache separately" or "don't cache"
          • replace Query with SolrFilter (that contains a Query as well as the extra metadata)... involves many deprecations.
          Show
          Yonik Seeley added a comment - The hardest part to this is passing the information through the Solr APIs to tell Solr to cache the clauses of a particular boolean query separately. 2 approaches: a SolrQuery that wrapps a normal Query and adds extra metadata like "cache separately" or "don't cache" replace Query with SolrFilter (that contains a Query as well as the extra metadata)... involves many deprecations.

            People

            • Assignee:
              Unassigned
              Reporter:
              Brian Pearson
            • Votes:
              25 Vote for this issue
              Watchers:
              21 Start watching this issue

              Dates

              • Created:
                Updated:

                Development