Solr
  1. Solr
  2. SOLR-2649

MM ignored in edismax queries with operators

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Minor Minor
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: 4.9, 5.0
    • Component/s: query parsers
    • Labels:
      None

      Description

      Hypothetical scenario:
      1. User searches for "stocks oil gold" with MM set to "50%"
      2. User adds "-stockings" to the query: "stocks oil gold -stockings"
      3. User gets no hits since MM was ignored and all terms where AND-ed together

      The behavior seems to be intentional, although the reason why is never explained:
      // For correct lucene queries, turn off mm processing if there
      // were explicit operators (except for AND).
      boolean doMinMatched = (numOR + numNOT + numPluses + numMinuses) == 0;
      (lines 232-234 taken from tags/lucene_solr_3_3/solr/src/java/org/apache/solr/search/ExtendedDismaxQParserPlugin.java)

      This makes edismax unsuitable as an replacement to dismax; mm is one of the primary features of dismax.

      1. SOLR-2649.diff
        5 kB
        Andrew Buchanan
      2. SOLR-2649.patch
        6 kB
        Jan Høydahl

        Issue Links

          Activity

          Hide
          Uwe Schindler added a comment -

          Move issue to Solr 4.9.

          Show
          Uwe Schindler added a comment - Move issue to Solr 4.9.
          Hide
          Brian Carver added a comment -

          I've been following this for at least two years. See my comment from February 2012, above. I can't tell if the proposed fix is a fix. We ought to have the goal: the system behaves in a deterministic way that can be explained to users and that, as little as possible, acts in ways contrary to user expectations (especially silently). The failure to abide by this principle is what made this issue so troubling to me, because users could know that whitespace would be interpreted as "AND" yet they would still get results that discarded the effect that operator should have had.

          Now, of course, users make mistakes. They submit ambiguous queries (or in the case of mm=100% for a disjunctive query, I guess we could call that a self-defeating or self-contradictory query--if I understand mm correctly).

          I still think that what is really needed is (1) a set of default rules for interpreting ambiguous queries that will always provide a deterministic result. These rules could be explained to users, and then what is also needed is that (2) when a user does something that doesn't make sense, given these default rules, they should get an error message.

          The ambiguous query discussed above was one where whitespace was set to "AND" and a user entered:
          (A or B or C) "D E"

          Such a user must be assuming that whitespace within quotation marks is ignored, i.e., that the quotation marks make "D E" a single term that must be matched exactly and that, given the default to conjunction for non-quoted whitespace, that her query will be parsed as:
          (A or B or C) AND "D E"

          that is, as a conjunction with two conjuncts, thus requiring that each conjunct be satisfied to get a matching result.

          My first question then is, what will happen to this query under the new patch? Will it be interpreted as expected?

          My second question is, why not adopt a set of default rules for ambiguous queries? Like the default order of operations in arithmetic, we simply need a convention to interpret 3 + 5 x 4 as 3 + (5 x 4). Just as it didn't matter in arithmetic which operators we favored, so long as everyone knows the convention, it also doesn't really matter what rules we adopt here, so long as we publicize them so users and maintainers know what to expect. I would propose the following:

          1. Whitespace within quotation marks is ignored (in that it is not turned into an operator), that is "D E" is interpreted as a single term that must match exactly.
          2. If a query lacks sufficient parentheses to create an unambiguous query, then the following rules will be applied:
          a. Insert parentheses around every occurrence of AND and its two conjuncts, starting with the rightmost AND.
          b. Insert parentheses in the same fashion for OR.
          c. Right parentheses are never inserted within another set of parentheses, i.e., no existing pairings are broken up.
          3. If one's query is nonsensical, an error message will be displayed explaining the problem. For example, if one has set mm to 100%, requiring every term to match, but yet one also issues a disjunctive query (A OR B) that would be satisfied if either term were to match, then one receives an error indicating that mm cannot be set to 100% while issuing a disjunctive query.

          I think those rules would be sufficient to resolve all ambiguous queries and the general idea that "If you leave out parentheses, then they'll be added to the smallest available units starting from the right, and starting with conjunction" is one that users could (somewhat) easily grasp.

          But, as I said in 2012, my grasp on how solr handles mm is tenuous at best, so perhaps someone will explain that I'm misunderstanding something important.

          Show
          Brian Carver added a comment - I've been following this for at least two years. See my comment from February 2012, above. I can't tell if the proposed fix is a fix. We ought to have the goal: the system behaves in a deterministic way that can be explained to users and that, as little as possible, acts in ways contrary to user expectations (especially silently). The failure to abide by this principle is what made this issue so troubling to me, because users could know that whitespace would be interpreted as "AND" yet they would still get results that discarded the effect that operator should have had. Now, of course, users make mistakes. They submit ambiguous queries (or in the case of mm=100% for a disjunctive query, I guess we could call that a self-defeating or self-contradictory query--if I understand mm correctly). I still think that what is really needed is (1) a set of default rules for interpreting ambiguous queries that will always provide a deterministic result. These rules could be explained to users, and then what is also needed is that (2) when a user does something that doesn't make sense, given these default rules, they should get an error message. The ambiguous query discussed above was one where whitespace was set to "AND" and a user entered: (A or B or C) "D E" Such a user must be assuming that whitespace within quotation marks is ignored, i.e., that the quotation marks make "D E" a single term that must be matched exactly and that, given the default to conjunction for non-quoted whitespace, that her query will be parsed as: (A or B or C) AND "D E" that is, as a conjunction with two conjuncts, thus requiring that each conjunct be satisfied to get a matching result. My first question then is, what will happen to this query under the new patch? Will it be interpreted as expected? My second question is, why not adopt a set of default rules for ambiguous queries? Like the default order of operations in arithmetic, we simply need a convention to interpret 3 + 5 x 4 as 3 + (5 x 4). Just as it didn't matter in arithmetic which operators we favored, so long as everyone knows the convention, it also doesn't really matter what rules we adopt here, so long as we publicize them so users and maintainers know what to expect. I would propose the following: 1. Whitespace within quotation marks is ignored (in that it is not turned into an operator), that is "D E" is interpreted as a single term that must match exactly. 2. If a query lacks sufficient parentheses to create an unambiguous query, then the following rules will be applied: a. Insert parentheses around every occurrence of AND and its two conjuncts, starting with the rightmost AND. b. Insert parentheses in the same fashion for OR. c. Right parentheses are never inserted within another set of parentheses, i.e., no existing pairings are broken up. 3. If one's query is nonsensical, an error message will be displayed explaining the problem. For example, if one has set mm to 100%, requiring every term to match, but yet one also issues a disjunctive query (A OR B) that would be satisfied if either term were to match, then one receives an error indicating that mm cannot be set to 100% while issuing a disjunctive query. I think those rules would be sufficient to resolve all ambiguous queries and the general idea that "If you leave out parentheses, then they'll be added to the smallest available units starting from the right, and starting with conjunction" is one that users could (somewhat) easily grasp. But, as I said in 2012, my grasp on how solr handles mm is tenuous at best, so perhaps someone will explain that I'm misunderstanding something important.
          Hide
          Andrew Buchanan added a comment -

          Looks good to me

          Show
          Andrew Buchanan added a comment - Looks good to me
          Hide
          Jan Høydahl added a comment -

          Any comments on the current patch? All tests pass. If there are certain boolean queries that you fear this patch will make worse than it is today then please add a unit test for it.

          Show
          Jan Høydahl added a comment - Any comments on the current patch? All tests pass. If there are certain boolean queries that you fear this patch will make worse than it is today then please add a unit test for it.
          Hide
          Jan Høydahl added a comment -

          Attaching new patch with CHANGES entry and a few more tests.

          The query (A OR B) C with mm=100% works as (A OR B) AND C since there are only 2 top-level clauses here, but the query A OR B C mm=100% is still interpreted as all three clauses being required.

          We could try detect such cases and combine clauses joined by explicit operators but that's probably a slippery slope given the messy string parsing in edismax...

          Show
          Jan Høydahl added a comment - Attaching new patch with CHANGES entry and a few more tests. The query (A OR B) C with mm=100% works as (A OR B) AND C since there are only 2 top-level clauses here, but the query A OR B C mm=100% is still interpreted as all three clauses being required. We could try detect such cases and combine clauses joined by explicit operators but that's probably a slippery slope given the messy string parsing in edismax...
          Hide
          Andrew Buchanan added a comment -

          Ping for Jan Høydahl to review

          Show
          Andrew Buchanan added a comment - Ping for Jan Høydahl to review
          Hide
          Andrew Buchanan added a comment -

          Here is the initial patch. Really it just involves removing some code and adding a few tests to confirm things work. It also modifies the previously mentioned test to conform with the expectations above.

          Show
          Andrew Buchanan added a comment - Here is the initial patch. Really it just involves removing some code and adding a few tests to confirm things work. It also modifies the previously mentioned test to conform with the expectations above.
          Hide
          Naomi Dushay added a comment -

          I believe the changes Andrew is suggesting sound good. I recently make careful improvements to our CJK Resource discovery (I'm in the midst of blogging about it), and in combing through our logs of the last few days, I pulled out a few actual use cases where we have CJK characters and "OR":

          鈴木重雄 OR 日本精神生成史論
          毛澤東 OR 基礎戰
          日報 OR 濟南
          飄 OR 上海

          there are others. Note that we have no actual cases of CJK + non-CJK characters and 'OR'.

          In my relevancy tests for CJK (supplied by East Asian language librarians), I didn't find many useful examples to exercise the case above. I could try to apply a patch locally and check how it affects our ~1000 relevancy tests, but we are currently running Solr 4.4. It would be much more tractable if there is a Solr 4.x patch available for testing.

          Here is the only realistic examples I could find from our test code:
          スポーツ OR supotsu
          both clauses translate to "sports" (from Japanese)

          So from my perspective, the cjk test is a corner case, and I think Andrew's approach sounds great. Tom Burton-West and I are partly behind Robert Muir's fix, so getting Tom BW to weigh in would be great.

          Show
          Naomi Dushay added a comment - I believe the changes Andrew is suggesting sound good. I recently make careful improvements to our CJK Resource discovery (I'm in the midst of blogging about it), and in combing through our logs of the last few days, I pulled out a few actual use cases where we have CJK characters and "OR": 鈴木重雄 OR 日本精神生成史論 毛澤東 OR 基礎戰 日報 OR 濟南 飄 OR 上海 there are others. Note that we have no actual cases of CJK + non-CJK characters and 'OR'. In my relevancy tests for CJK (supplied by East Asian language librarians), I didn't find many useful examples to exercise the case above. I could try to apply a patch locally and check how it affects our ~1000 relevancy tests, but we are currently running Solr 4.4. It would be much more tractable if there is a Solr 4.x patch available for testing. Here is the only realistic examples I could find from our test code: スポーツ OR supotsu both clauses translate to "sports" (from Japanese) So from my perspective, the cjk test is a corner case, and I think Andrew's approach sounds great. Tom Burton-West and I are partly behind Robert Muir's fix, so getting Tom BW to weigh in would be great.
          Hide
          Andrew Buchanan added a comment -

          Just to clarify further with regards to TestExtendedDismaxParser.testCJKStructured, the following is how the queries would change.

          1. "大亚湾 OR bogus" MM=0%
          From: +(((standardtok:大 standardtok:亚 standardtok:湾)) (standardtok:bogus))
          To: +(((standardtok:大 standardtok:亚 standardtok:湾)) (standardtok:bogus))
          Unchanged

          2. "大亚湾 OR bogus" MM=67%
          From: +((((standardtok:大 standardtok:亚 standardtok:湾)~2)) (standardtok:bogus))
          To: +(((((standardtok:大 standardtok:亚 standardtok:湾)~2)) (standardtok:bogus))~1)
          Requires one top level clause to match (effectively the same)

          3. "大亚湾 OR bogus" MM=100%
          From: +((((standardtok:大 standardtok:亚 standardtok:湾)~3)) (standardtok:bogus))
          To: +(((((standardtok:大 standardtok:亚 standardtok:湾)~3)) (standardtok:bogus))~2)
          Requires both top level clauses to match

          This is effectively applying the MM to BOTH the inner clause AND the outer clause separately, which may or may not be what is desired...

          Show
          Andrew Buchanan added a comment - Just to clarify further with regards to TestExtendedDismaxParser.testCJKStructured, the following is how the queries would change. 1. "大亚湾 OR bogus" MM=0% From: +(((standardtok:大 standardtok:亚 standardtok:湾)) (standardtok:bogus)) To: +(((standardtok:大 standardtok:亚 standardtok:湾)) (standardtok:bogus)) Unchanged 2. "大亚湾 OR bogus" MM=67% From: +((((standardtok:大 standardtok:亚 standardtok:湾)~2)) (standardtok:bogus)) To: +(((((standardtok:大 standardtok:亚 standardtok:湾)~2)) (standardtok:bogus))~1) Requires one top level clause to match (effectively the same) 3. "大亚湾 OR bogus" MM=100% From: +((((standardtok:大 standardtok:亚 standardtok:湾)~3)) (standardtok:bogus)) To: +(((((standardtok:大 standardtok:亚 standardtok:湾)~3)) (standardtok:bogus))~2) Requires both top level clauses to match This is effectively applying the MM to BOTH the inner clause AND the outer clause separately, which may or may not be what is desired...
          Hide
          Andrew Buchanan added a comment -

          I'm taking a look at fixing this one.

          I've tracked this all the way through the code history and back through the old solr repository. It looks like it was originally submitted this way by Yonik Seeley as SOLR-1553. Any previous history that might explain the reasoning would presumably be in Lucid Imaginations source control system (which I don't have access to). The DisMax parser on which it was based simply used the MM values as passed in, as has been previously noted.

          Hoss Man refers to this behavior as a bug at https://issues.apache.org/jira/browse/SOLR-1553?focusedCommentId=12871244&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12871244 on the original SOLR-1553.

          If you force doMinMatched = true to disable this logic in ExtendedDismaxQParser, everything seems to work as expected above with the exception of one test case that fails (TestExtendedDismaxParser.testCJKStructured). This test case was added as part of r1406437 by Robert Muir for SOLR-3589 - Edismax parser does not honor mm parameter if analyzer splits a token.

          The last query in that test case is "大亚湾 OR bogus" with mm=100% which the test is expecting to evaluate to "((((standardtok:大 standardtok:亚 standardtok:湾)~3)) (standardtok:bogus))". The comment for the test from Robert Muir indicates that it should "always apply minShouldMatch to the inner booleanqueries created from whitespace, as these are never structured lucene queries but only come from unstructured text". Looking at that query though, it seems to me that it should instead evaluate to "(((((standardtok:大 standardtok:亚 standardtok:湾)~3)) (standardtok:bogus))~2)", essentially applying the MM to the top level clauses. I'm certainly not a CJK language expert though, so there may be a subtlety here I'm unaware of.

          I can put together a patch with some test cases to make this behave as folks here seem to expect, but I would like to get some clarification from Robert if possible on whether he agrees that the existing test case should change...

          Show
          Andrew Buchanan added a comment - I'm taking a look at fixing this one. I've tracked this all the way through the code history and back through the old solr repository. It looks like it was originally submitted this way by Yonik Seeley as SOLR-1553 . Any previous history that might explain the reasoning would presumably be in Lucid Imaginations source control system (which I don't have access to). The DisMax parser on which it was based simply used the MM values as passed in, as has been previously noted. Hoss Man refers to this behavior as a bug at https://issues.apache.org/jira/browse/SOLR-1553?focusedCommentId=12871244&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12871244 on the original SOLR-1553 . If you force doMinMatched = true to disable this logic in ExtendedDismaxQParser, everything seems to work as expected above with the exception of one test case that fails (TestExtendedDismaxParser.testCJKStructured). This test case was added as part of r1406437 by Robert Muir for SOLR-3589 - Edismax parser does not honor mm parameter if analyzer splits a token. The last query in that test case is "大亚湾 OR bogus" with mm=100% which the test is expecting to evaluate to " ((((standardtok:大 standardtok:亚 standardtok:湾)~3)) (standardtok:bogus))". The comment for the test from Robert Muir indicates that it should "always apply minShouldMatch to the inner booleanqueries created from whitespace, as these are never structured lucene queries but only come from unstructured text". Looking at that query though, it seems to me that it should instead evaluate to " (((((standardtok:大 standardtok:亚 standardtok:湾)~3)) (standardtok:bogus))~2)", essentially applying the MM to the top level clauses. I'm certainly not a CJK language expert though, so there may be a subtlety here I'm unaware of. I can put together a patch with some test cases to make this behave as folks here seem to expect, but I would like to get some clarification from Robert if possible on whether he agrees that the existing test case should change...
          Hide
          Ron Davies added a comment -
          Show
          Ron Davies added a comment -
          Hide
          Anca Kopetz added a comment - - edited

          We need to apply Min should match for edismax query strings with operators (AND,OR) and mm parameter.

          For example, when the below query was parsed, the mm was not applied

          &q=(((termA AND termB) OR specialTerm) (termC AND termD) (termE))&mm=2&defType=edismax&qf=title
          

          Therefore we developed our custom query parser.
          The code is below, maybe it is useful for somebody who has the same requirements.

          CustomExtendedDismaxQParser.java
          public class CustomExtendedDismaxQParser extends ExtendedDismaxQParser {
             public CustomExtendedDismaxQParser(String qstr, SolrParams localParams, SolrParams params, SolrQueryRequest req) {
                super(qstr, localParams, params, req);
             }
          
             @Override
             protected Query parseOriginalQuery(ExtendedSolrQueryParser up, String mainUserQuery, List<Clause> clauses, ExtendedDismaxConfiguration config) {
                Query query = super.parseOriginalQuery(up, mainUserQuery, clauses, config);
                String mmValue = this.params.get(DisMaxParams.MM);
                if(!Strings.isNullOrEmpty(mmValue)) {
                   if (query instanceof BooleanQuery) {
                      SolrPluginUtils.setMinShouldMatch((BooleanQuery)query, mmValue);
                   }
                }
                return query;
             }
          }
          
          solrconfig.xml
          <queryParser name="customEdismax" class="com.kelkoo.search.solr.plugins.CustomExtendedDismaxQParserPlugin"/>
          

          Then we set defType=customEdismax in the query parameters.

          With these configuration, mm is applied on top-level clauses. In our example, there are 3 top-level SHOULD clauses :

           ((termA AND termB) OR specialTerm), (termC AND termD), (termE)
          

          And the parsed query is :

          +((
              ((+(title:termA) +(title:termB)) (title:specialTerm)) 
              (+(title:termC) +(title:termD)) 
              (title:termE)
            )~2) 
          
          Show
          Anca Kopetz added a comment - - edited We need to apply Min should match for edismax query strings with operators (AND,OR) and mm parameter. For example, when the below query was parsed, the mm was not applied &q=(((termA AND termB) OR specialTerm) (termC AND termD) (termE))&mm=2&defType=edismax&qf=title Therefore we developed our custom query parser. The code is below, maybe it is useful for somebody who has the same requirements. CustomExtendedDismaxQParser.java public class CustomExtendedDismaxQParser extends ExtendedDismaxQParser { public CustomExtendedDismaxQParser( String qstr, SolrParams localParams, SolrParams params, SolrQueryRequest req) { super (qstr, localParams, params, req); } @Override protected Query parseOriginalQuery(ExtendedSolrQueryParser up, String mainUserQuery, List<Clause> clauses, ExtendedDismaxConfiguration config) { Query query = super .parseOriginalQuery(up, mainUserQuery, clauses, config); String mmValue = this .params.get(DisMaxParams.MM); if (!Strings.isNullOrEmpty(mmValue)) { if (query instanceof BooleanQuery) { SolrPluginUtils.setMinShouldMatch((BooleanQuery)query, mmValue); } } return query; } } solrconfig.xml <queryParser name= "customEdismax" class= "com.kelkoo.search.solr.plugins.CustomExtendedDismaxQParserPlugin" /> Then we set defType=customEdismax in the query parameters. With these configuration, mm is applied on top-level clauses. In our example, there are 3 top-level SHOULD clauses : ((termA AND termB) OR specialTerm), (termC AND termD), (termE) And the parsed query is : +(( ((+(title:termA) +(title:termB)) (title:specialTerm)) (+(title:termC) +(title:termD)) (title:termE) )~2)
          Hide
          Steve Rowe added a comment -

          Bulk move 4.4 issues to 4.5 and 5.0

          Show
          Steve Rowe added a comment - Bulk move 4.4 issues to 4.5 and 5.0
          Hide
          Naomi Dushay added a comment - - edited

          Our dismax mm setting is 6<-1 6<90%.

          I would like our mm to be honored for the top-level SHOULD clauses. Oh please, oh please?

          EDISMAX

          q=customer driven academic library:
          +(((custom)~0.01 (driven)~0.01 (academ)~0.01 (librari)~0.01)~4) 4 hits

          customer NOT driven academic library:
          +((custom)~0.01 -(driven)~0.01 (academ)~0.01 (librari)~0.01) 984300 hits <= INSANE

          customer -driven academic library:
          +((custom)~0.01 -(driven)~0.01 (academ)~0.01 (librari)~0.01) 984300 hits <= INSANE

          customer OR academic OR library NOT driven:
          +((custom)~0.01 (academ)~0.01 (librari)~0.01 -(driven)~0.01) 984300 hits

          customer academic library:
          +(((custom)~0.01 (academ)~0.01 (librari)~0.01)~3) 100 hits

          DISMAX (plausible results!):

          customer driven academic library:
          +(((custom)~0.01 (driven)~0.01 (academ)~0.01 (librari)~0.01)~4) () 4 hits

          customer NOT driven academic library:
          +(((custom)~0.01 -(driven)~0.01 (academ)~0.01 (librari)~0.01)~3) () 96 hits

          customer -driven academic library:
          +(((custom)~0.01 -(driven)~0.01 (academ)~0.01 (librari)~0.01)~3) () 96 hits

          customer academic library:
          +(((custom)~0.01 (academ)~0.01 (librari)~0.01)~3)() 100 hits

          Show
          Naomi Dushay added a comment - - edited Our dismax mm setting is 6<-1 6<90%. I would like our mm to be honored for the top-level SHOULD clauses. Oh please, oh please? EDISMAX q=customer driven academic library: +(((custom)~0.01 (driven)~0.01 (academ)~0.01 (librari)~0.01)~4) 4 hits customer NOT driven academic library: +((custom)~0.01 -(driven)~0.01 (academ)~0.01 (librari)~0.01) 984300 hits <= INSANE customer -driven academic library: +((custom)~0.01 -(driven)~0.01 (academ)~0.01 (librari)~0.01) 984300 hits <= INSANE customer OR academic OR library NOT driven: +((custom)~0.01 (academ)~0.01 (librari)~0.01 -(driven)~0.01) 984300 hits customer academic library: +(((custom)~0.01 (academ)~0.01 (librari)~0.01)~3) 100 hits DISMAX (plausible results!): customer driven academic library: +(((custom)~0.01 (driven)~0.01 (academ)~0.01 (librari)~0.01)~4) () 4 hits customer NOT driven academic library: +(((custom)~0.01 -(driven)~0.01 (academ)~0.01 (librari)~0.01)~3) () 96 hits customer -driven academic library: +(((custom)~0.01 -(driven)~0.01 (academ)~0.01 (librari)~0.01)~3) () 96 hits customer academic library: +(((custom)~0.01 (academ)~0.01 (librari)~0.01)~3)() 100 hits
          Hide
          Shawn Heisey added a comment -

          Jan Høydahl I like your suggestion. I would want to be sure that if I specify mm (either in the request handler defaults or in my query params), it will ignore q.op and use the value specified. As for coding, I think I'll be useless in this area, though I'm interested in taking a look if anyone can point me at specific class names.

          Show
          Shawn Heisey added a comment - Jan Høydahl I like your suggestion. I would want to be sure that if I specify mm (either in the request handler defaults or in my query params), it will ignore q.op and use the value specified. As for coding, I think I'll be useless in this area, though I'm interested in taking a look if anyone can point me at specific class names.
          Hide
          Jan Høydahl added a comment -

          So, are we ready to agree on wanted behavior and start coding?

          I'll try to formulate a suggestion:

          Presence of explicit operators in the query should not totally disregard mm/q.op.
          For the pure 0%/q.op=OR case, mm will be 0 and work correctly - as today
          For the pure 100%/q.op=AND case, mm will be set to # top-level SHOULD clauses and work as expected.
          For the mm=1..n/mm=1%..99% case, mm will be calculated based on # top-level SHOULD clauses and work as expected.

          Show
          Jan Høydahl added a comment - So, are we ready to agree on wanted behavior and start coding? I'll try to formulate a suggestion: Presence of explicit operators in the query should not totally disregard mm/q.op. For the pure 0%/q.op=OR case, mm will be 0 and work correctly - as today For the pure 100%/q.op=AND case, mm will be set to # top-level SHOULD clauses and work as expected. For the mm=1..n/mm=1%..99% case, mm will be calculated based on # top-level SHOULD clauses and work as expected.
          Hide
          Shawn Heisey added a comment -

          Thank you! We have been waiting a long time for this fix.

          I'm a little confused here. Were you talking to me? I don't have a fix, I was just saying that I'm having the same problem, and that my problem is not exactly like the initial description. The initial description says that when boolean operators are present, edismax behaves as if mm=100%. I'm seeing the opposite.

          To summarize: When boolean operators are present in the query, two versions of Solr are behaving as if I did not have mm=100%, q.op=AND, or defaultOperator=AND in the schema. Both versions behave as if the default operator is OR. For 3.5, I have tried all three of those options simultaneously. For 4.1, I have tried just the first two, because defaultOperator is deprecated.

          Show
          Shawn Heisey added a comment - Thank you! We have been waiting a long time for this fix. I'm a little confused here. Were you talking to me? I don't have a fix, I was just saying that I'm having the same problem, and that my problem is not exactly like the initial description. The initial description says that when boolean operators are present, edismax behaves as if mm=100%. I'm seeing the opposite. To summarize: When boolean operators are present in the query, two versions of Solr are behaving as if I did not have mm=100%, q.op=AND, or defaultOperator=AND in the schema. Both versions behave as if the default operator is OR. For 3.5, I have tried all three of those options simultaneously. For 4.1, I have tried just the first two, because defaultOperator is deprecated.
          Hide
          Shawn Heisey added a comment -

          The 3.5 schema also contains this: <solrQueryParser defaultOperator="AND"/>

          That has been removed from the 4.1 schema, q.op in solrconfig.xml is used instead.

          Show
          Shawn Heisey added a comment - The 3.5 schema also contains this: <solrQueryParser defaultOperator="AND"/> That has been removed from the 4.1 schema, q.op in solrconfig.xml is used instead.
          Hide
          Shawn Heisey added a comment -

          Just ran into this (or something like it) on both my production 3.5 and dev 4.1 servers.

          What I see happening with edismax queries that contain operators is this: Both mm (100%) and q.op (AND) are ignored so that it acts as if q.op were OR. Instead of 8k results, there are over 300k. With a sort parameter, most of the results actually seen are invalid. Here is an actual query from my log:

          ( (young man close up NOT woman NOT couple))

          Show
          Shawn Heisey added a comment - Just ran into this (or something like it) on both my production 3.5 and dev 4.1 servers. What I see happening with edismax queries that contain operators is this: Both mm (100%) and q.op (AND) are ignored so that it acts as if q.op were OR. Instead of 8k results, there are over 300k. With a sort parameter, most of the results actually seen are invalid. Here is an actual query from my log: ( (young man close up NOT woman NOT couple))
          Hide
          Robert Muir added a comment -

          Unassigned issues -> 4.1

          Show
          Robert Muir added a comment - Unassigned issues -> 4.1
          Hide
          John Freier added a comment -

          Hey folks. I ran across this issue after noticing thousands of odd seeming result sets with good ol' v3.4. I don't know all of the deeper implications, but I think Jack's summary and a couple others' comments would make the most sense and are already what seem to be described in the current documentation which states that the boolean operators in front of of various terms or phrases define them to be specifically included or excluded, while the others are classified as 'optional'. Then, in the documentation on minimum match, it states that it is based the percentages etc off of the "optional" terms, so I would expect the boolean-specified ones not to be considered and for the mm to just be based off of whatever optionals there are.

          I saw this was a recent discussion so thought I'd chip in but sorry if you've already come to this conclusion and/or implemented it. Is this how the 4.0 Alpha is patched now or could anyone point me to any sort of temporary solution to achieve this functionality? Thanks for your great work. - John

          Show
          John Freier added a comment - Hey folks. I ran across this issue after noticing thousands of odd seeming result sets with good ol' v3.4. I don't know all of the deeper implications, but I think Jack's summary and a couple others' comments would make the most sense and are already what seem to be described in the current documentation which states that the boolean operators in front of of various terms or phrases define them to be specifically included or excluded, while the others are classified as 'optional'. Then, in the documentation on minimum match, it states that it is based the percentages etc off of the "optional" terms, so I would expect the boolean-specified ones not to be considered and for the mm to just be based off of whatever optionals there are. I saw this was a recent discussion so thought I'd chip in but sorry if you've already come to this conclusion and/or implemented it. Is this how the 4.0 Alpha is patched now or could anyone point me to any sort of temporary solution to achieve this functionality? Thanks for your great work. - John
          Hide
          Jack Krupansky added a comment -

          I just ran a test with 4.0-BETA and it turns out that overriding the default operator (using the "q.op" parameter) is also ignored when any operator is present, for the exact same reason that "mm" is ignored - since edismax implements q.op using minMatch, which is disabled by the presence of an operator. As commented above, that aspect of the problem has been around for a year now. Wow.

          I'm leaning towards relaxing the "mm" rules so that minMatch will occur regardless of whether operators are present. But, I think the default for "mm" should be "0%", rather than based on "q.op" as is done today.

          I suspect that the restriction on use of minMatch may have been a side effect of having "mm" default based on "q.op". For example, if the user query is "x y +z", they are explicitly detailing which terms should be ANDed, so it wouldn't make sense in that case to apply q.op to x and y, but it still makes sense to apply minMatch to all optional terms. But if no operators are present, THEN you want q.op to apply to each term, and minMatch as well.

          In short, q.op should only apply when no operators are present, but minMatch should apply when either q.op=OR or there are optional terms present.

          I still need to think about the interaction between edismax and the Lucene query parser, especially for nested queries, such as (a b c) AND (d e +f)&q.op=AND. Currently, the minMatch processing in edismax is limited to the top-level BooleanQuery, not any nested queries.

          Show
          Jack Krupansky added a comment - I just ran a test with 4.0-BETA and it turns out that overriding the default operator (using the "q.op" parameter) is also ignored when any operator is present, for the exact same reason that "mm" is ignored - since edismax implements q.op using minMatch, which is disabled by the presence of an operator. As commented above, that aspect of the problem has been around for a year now. Wow. I'm leaning towards relaxing the "mm" rules so that minMatch will occur regardless of whether operators are present. But, I think the default for "mm" should be "0%", rather than based on "q.op" as is done today. I suspect that the restriction on use of minMatch may have been a side effect of having "mm" default based on "q.op". For example, if the user query is "x y +z", they are explicitly detailing which terms should be ANDed, so it wouldn't make sense in that case to apply q.op to x and y, but it still makes sense to apply minMatch to all optional terms. But if no operators are present, THEN you want q.op to apply to each term, and minMatch as well. In short, q.op should only apply when no operators are present, but minMatch should apply when either q.op=OR or there are optional terms present. I still need to think about the interaction between edismax and the Lucene query parser, especially for nested queries, such as (a b c) AND (d e +f)&q.op=AND. Currently, the minMatch processing in edismax is limited to the top-level BooleanQuery, not any nested queries.
          Hide
          Robert Muir added a comment -

          rmuir20120906-bulk-40-change

          Show
          Robert Muir added a comment - rmuir20120906-bulk-40-change
          Hide
          Hoss Man added a comment -

          bulk fixing the version info for 4.0-ALPHA and 4.0 all affected issues have "hoss20120711-bulk-40-change" in comment

          Show
          Hoss Man added a comment - bulk fixing the version info for 4.0-ALPHA and 4.0 all affected issues have "hoss20120711-bulk-40-change" in comment
          Hide
          Vadim Kisselmann added a comment -

          Same Problem here:
          http://www.mail-archive.com/solr-user@lucene.apache.org/msg65463.html

          In my case, with mm=100% and defaultOperator=AND i get different results with:
          nascar AND author:serg*
          and
          nascar +author:serg* (with debugQuery i can see, that nascar is an SHOULD MATCH, but it should be an MUST MATCH with mm=100%)

          Show
          Vadim Kisselmann added a comment - Same Problem here: http://www.mail-archive.com/solr-user@lucene.apache.org/msg65463.html In my case, with mm=100% and defaultOperator=AND i get different results with: nascar AND author:serg* and nascar +author:serg* (with debugQuery i can see, that nascar is an SHOULD MATCH, but it should be an MUST MATCH with mm=100%)
          Hide
          Mike added a comment -

          It seems like we have a general consensus here, but I want to confirm my understanding of a few queries:

          With default operator set to AND, and with no mm param:
          [ a b (c OR d) ] gets interpreted as [ a AND b AND (c OR d) ]

          [ a -b (c OR d) ] gets interpreted as [ a NOT b AND (c OR d) ]

          So basically, the trend here is that wherever the user leaves out an operator, AND is introduced, as Neil seems to be doing. This is really the only way I would expect defaultOperator to work, and I know my users think this way as well.

          If that's the consensus, then it would be great to push this forward (not sure I can help with that though).

          Show
          Mike added a comment - It seems like we have a general consensus here, but I want to confirm my understanding of a few queries: With default operator set to AND, and with no mm param: [ a b (c OR d) ] gets interpreted as [ a AND b AND (c OR d) ] [ a -b (c OR d) ] gets interpreted as [ a NOT b AND (c OR d) ] So basically, the trend here is that wherever the user leaves out an operator, AND is introduced, as Neil seems to be doing. This is really the only way I would expect defaultOperator to work, and I know my users think this way as well. If that's the consensus, then it would be great to push this forward (not sure I can help with that though).
          Hide
          Neil Hooey added a comment -

          I agree with Hoss' view that mm should apply to all top-level SHOULD clauses in the query, especially since that's how it works in dismax.

          The q.op and/or <defaultOperator> should definitely not get overridden when mm is at least not specified.

          In our situation we currently have to add an AND operator to fix queries that have a NOT or - operator in them.

          Does anyone have a patch to fix this behaviour?

          Show
          Neil Hooey added a comment - I agree with Hoss' view that mm should apply to all top-level SHOULD clauses in the query, especially since that's how it works in dismax. The q.op and/or <defaultOperator> should definitely not get overridden when mm is at least not specified. In our situation we currently have to add an AND operator to fix queries that have a NOT or - operator in them. Does anyone have a patch to fix this behaviour?
          Hide
          Brian Carver added a comment -

          I'm new to solr, so I have a tenuous grasp on some of these issues, but I've understood boolean logic for a couple of decades and it seems to me like solr's current behavior is thwarting the expectations of those who understand what they want and explicitly ask for it. Mike's example above is what troubles me.

          Principles:
          1. The maintainer sets whitespace to be interpreted as AND or OR and solr should do nothing to change that in particular instances.
          2. Where a user inputs an ambiguous query, a default rule about how operator scope will work is needed and that also should not be changed in particular instances.

          So, Mike says he sets whitespace to AND, users know this, and then a user enters:

          Example 1: (A or B or C) "D E"

          Given the above assumptions, the only reasonable interpretation of this is:

          (A or B or C) AND "D E" which is a conjunction with two conjuncts, both of which must be satisfied for a result to be produced, yet Mike/the user gets results that only satisfy one of the conjuncts. That shouldn't happen.

          I'd agree though that how to understand/apply mm in some of the examples above creates hard questions, but that is why many search engines provide two interfaces, one "natural language" interface and one that requires strict use of boolean syntax. Allowing people to enter some boolean operators (which they're going to expect will be respected-no-matter-what) and simultaneously interpreting their query using mm handlers intended for a more rough-and-ready approach is just going to lead to confused end users most of the time. So, in some ways, ignoring mm when operators are used is a feature, not a bug, but that seems orthogonal to the completely unacceptable outcome Mike described: whatever is causing THAT, is a bug.

          Show
          Brian Carver added a comment - I'm new to solr, so I have a tenuous grasp on some of these issues, but I've understood boolean logic for a couple of decades and it seems to me like solr's current behavior is thwarting the expectations of those who understand what they want and explicitly ask for it. Mike's example above is what troubles me. Principles: 1. The maintainer sets whitespace to be interpreted as AND or OR and solr should do nothing to change that in particular instances. 2. Where a user inputs an ambiguous query, a default rule about how operator scope will work is needed and that also should not be changed in particular instances. So, Mike says he sets whitespace to AND, users know this, and then a user enters: Example 1: (A or B or C) "D E" Given the above assumptions, the only reasonable interpretation of this is: (A or B or C) AND "D E" which is a conjunction with two conjuncts, both of which must be satisfied for a result to be produced, yet Mike/the user gets results that only satisfy one of the conjuncts. That shouldn't happen. I'd agree though that how to understand/apply mm in some of the examples above creates hard questions, but that is why many search engines provide two interfaces, one "natural language" interface and one that requires strict use of boolean syntax. Allowing people to enter some boolean operators (which they're going to expect will be respected-no-matter-what) and simultaneously interpreting their query using mm handlers intended for a more rough-and-ready approach is just going to lead to confused end users most of the time. So, in some ways, ignoring mm when operators are used is a feature, not a bug, but that seems orthogonal to the completely unacceptable outcome Mike described: whatever is causing THAT, is a bug.
          Hide
          Hoss Man added a comment -

          Counting multiple terms as 1 because they are in parenthesis together doesn't seem like a good idea to me.

          I disagree, but it definitely just seems like a matter of opinion – i don't know that we could ever come up with something that makes sense in all use cases

          personally i think the sanest change would be to say that "mm" applies to all top level SHOULD clauses in the query (regardless of wether they have an explicit OR or not) – exactly as it always has in dismax. if a top level clause is a nested boolean queries, then "mm" shouldn't apply to those because it doesn't make sense to blur the "count" of how many SHOULD clauses there are at the various levels.

          would would mm=5 mean for a query like "q=X AND Y (a b) (c d) (e f) (g h)" if you looked at all the nested subqueries? that only 5 of those 8 (lowercase) leaf level clauses are required? how would that be implemented on the underlying BooleanQuery objects w/o completely flattening the query (which would break the intent of the user when they grouped them) ... it seems like mm=5 (or mm=100%) should mean 5 (or 100%) of the top level SHOULD clauses are required ... the default query op should determine how any top level clauses that are BooleanQueries are dealt with.

          ...but that's just my opinion.

          Show
          Hoss Man added a comment - Counting multiple terms as 1 because they are in parenthesis together doesn't seem like a good idea to me. I disagree, but it definitely just seems like a matter of opinion – i don't know that we could ever come up with something that makes sense in all use cases personally i think the sanest change would be to say that "mm" applies to all top level SHOULD clauses in the query (regardless of wether they have an explicit OR or not) – exactly as it always has in dismax. if a top level clause is a nested boolean queries, then "mm" shouldn't apply to those because it doesn't make sense to blur the "count" of how many SHOULD clauses there are at the various levels. would would mm=5 mean for a query like "q=X AND Y (a b) (c d) (e f) (g h)" if you looked at all the nested subqueries? that only 5 of those 8 (lowercase) leaf level clauses are required? how would that be implemented on the underlying BooleanQuery objects w/o completely flattening the query (which would break the intent of the user when they grouped them) ... it seems like mm=5 (or mm=100%) should mean 5 (or 100%) of the top level SHOULD clauses are required ... the default query op should determine how any top level clauses that are BooleanQueries are dealt with. ...but that's just my opinion.
          Hide
          Jan Høydahl added a comment -

          When bringing up all these cases, we may perhaps understand the reason for the current behavior after all However, it is flawed in assuming that schema's defaultOperator should be used instead of mm.

          Here's a concrete suggestion for improvement

          • For mm=0%, mm=100% or no mm specified: Disable mm as today, but induce defaultOperator from the mm value
          • For all other values of mm, use James' method of counting "optional" terms (including OR'ed ones) and apply "mm" to those.

          This would be a big step in right direction and probably fix most peoples needs

          Show
          Jan Høydahl added a comment - When bringing up all these cases, we may perhaps understand the reason for the current behavior after all However, it is flawed in assuming that schema's defaultOperator should be used instead of mm. Here's a concrete suggestion for improvement For mm=0%, mm=100% or no mm specified: Disable mm as today, but induce defaultOperator from the mm value For all other values of mm, use James' method of counting "optional" terms (including OR'ed ones) and apply "mm" to those. This would be a big step in right direction and probably fix most peoples needs
          Hide
          James Dyer added a comment -

          It seems it would be simpler to implement and understand if we just counted up the optional words in the query and apply "mm" to those. I suppose you could create a subtle rule that naked terms count for "mm" but OR-ed terms do not. This might be functionality someone wants but then again it might confuse others who would expect "x OR y" to mean the same as "x y".

          Counting multiple terms as 1 because they are in parenthesis together doesn't seem like a good idea to me. But then again, maybe someone out there would appreciate all the subtle things you could do with this?

          I guess whatever is decided just needs to be well-documented so when/if someone is surprised by the functionality they can look it up and see what's going on. Whatever is done, it will be a nice improvement over the current behavior.

          Show
          James Dyer added a comment - It seems it would be simpler to implement and understand if we just counted up the optional words in the query and apply "mm" to those. I suppose you could create a subtle rule that naked terms count for "mm" but OR-ed terms do not. This might be functionality someone wants but then again it might confuse others who would expect "x OR y" to mean the same as "x y". Counting multiple terms as 1 because they are in parenthesis together doesn't seem like a good idea to me. But then again, maybe someone out there would appreciate all the subtle things you could do with this? I guess whatever is decided just needs to be well-documented so when/if someone is surprised by the functionality they can look it up and see what's going on. Whatever is done, it will be a nice improvement over the current behavior.
          Hide
          Jan Høydahl added a comment -

          Yes I think the key here is what terms are part of some user imposed operator (forced MUST or MUST NOT) vs what terms are left dangling in the wild to be subject to mm. But what about this

          q=word1 AND word2 (word3 OR word4) word5%mm=100%
          

          Should this be interpreted as MUST have word1 AND word2 and set mm=3 for word3, word4, word5? Don't think so. An OR does not mean the same as a "loose" term. This would clearly (perhaps because of the parens) signal that word3 OR word4 should be treated as one unit, not requiring both of them?

          Show
          Jan Høydahl added a comment - Yes I think the key here is what terms are part of some user imposed operator (forced MUST or MUST NOT) vs what terms are left dangling in the wild to be subject to mm. But what about this q=word1 AND word2 (word3 OR word4) word5%mm=100% Should this be interpreted as MUST have word1 AND word2 and set mm=3 for word3, word4, word5? Don't think so. An OR does not mean the same as a "loose" term. This would clearly (perhaps because of the parens) signal that word3 OR word4 should be treated as one unit, not requiring both of them?
          Hide
          Mike added a comment -

          That makes sense to me and sounds like the simplest, most logical solution.

          I'm mostly in favor of the easiest thing that will make default AND queries work properly as quickly as possible.

          Show
          Mike added a comment - That makes sense to me and sounds like the simplest, most logical solution. I'm mostly in favor of the easiest thing that will make default AND queries work properly as quickly as possible.
          Hide
          James Dyer added a comment -

          Maybe a simple answer is to have it make "mm" apply to all optional terms and ignore the rest. So for...

          q=word1 AND word2 word3%mm=50%
          

          ..."word3" is the only optional term, so mm=50% only applies to "word3".

          And for...

          q=word1 OR word2 word3 word4 word5%mm=50%
          

          ...Everything here is optional, so "mm" applies to all the terms. Otherwise, you'd be in a situation where "OR" takes on a meaning that is different from "optional" and I'm not sure you want to introduce a 4th concept here beyond what we already have: required/optional/prohibited.

          The semantics of "mm" would then become "the minimum of all optional terms that need to match".

          Show
          James Dyer added a comment - Maybe a simple answer is to have it make "mm" apply to all optional terms and ignore the rest. So for... q=word1 AND word2 word3%mm=50% ..."word3" is the only optional term, so mm=50% only applies to "word3". And for... q=word1 OR word2 word3 word4 word5%mm=50% ...Everything here is optional, so "mm" applies to all the terms. Otherwise, you'd be in a situation where "OR" takes on a meaning that is different from "optional" and I'm not sure you want to introduce a 4th concept here beyond what we already have: required/optional/prohibited. The semantics of "mm" would then become "the minimum of all optional terms that need to match".
          Hide
          Jan Høydahl added a comment -

          So how should the parser interpret these examples?

          q=word1 word2 word3 -word4&mm=100%
          

          I agree with Ahmet that here both word1, word2 and word3 must be required since mm is explicitly specified. If mm is not specified, mm is set from defaultOperator, i.e. AND=>100%, OR=>0

          q=word1 word2 word3 -word4%mm=50%
          

          Here you'd expect that two of of the three first words must match.

          q=word1 OR word2 word3%mm=100%
          Example after having indexed exampledocs:
          http://localhost:8983/solr/browse?q=ipod%20OR%20samsung%20printer&debugQuery=true&mm=100%25
          

          With ipod OR samsung I get 5 hits. Adding the word "printer" yields 6 hits, i.e. it is OR'ed too. Here I'd expect the equivalent of (word1 OR word2) AND word3.

          q=word1 AND word2 word3%mm=50%
          

          What would you expect for this? Perhaps (word1 AND word2) to be treated as clause1 and word3 as clause2 and then apply mm=1?

          q=word1 OR word2 word3 word4 word5%mm=50%
          

          How about this? Again, it would make sense to respect (word1 OR word2) as one clause and then require two clauses out of the resulting four.

          Show
          Jan Høydahl added a comment - So how should the parser interpret these examples? q=word1 word2 word3 -word4&mm=100% I agree with Ahmet that here both word1, word2 and word3 must be required since mm is explicitly specified. If mm is not specified, mm is set from defaultOperator, i.e. AND=>100%, OR=>0 q=word1 word2 word3 -word4%mm=50% Here you'd expect that two of of the three first words must match. q=word1 OR word2 word3%mm=100% Example after having indexed exampledocs: http://localhost:8983/solr/browse?q=ipod%20OR%20samsung%20printer&debugQuery=true&mm=100%25 With ipod OR samsung I get 5 hits. Adding the word "printer" yields 6 hits, i.e. it is OR'ed too. Here I'd expect the equivalent of (word1 OR word2) AND word3. q=word1 AND word2 word3%mm=50% What would you expect for this? Perhaps (word1 AND word2) to be treated as clause1 and word3 as clause2 and then apply mm=1? q=word1 OR word2 word3 word4 word5%mm=50% How about this? Again, it would make sense to respect (word1 OR word2) as one clause and then require two clauses out of the resulting four.
          Hide
          Jan Høydahl added a comment -

          So how should the parser interpret these examples?

          q=word1 word2 word3 -word4&mm=100%
          

          I agree with Ahmet that here both word1, word2 and word3 must be required since mm is explicitly specified. If mm is not specified, mm is set from defaultOperator, i.e. AND=>100%, OR=>0

          q=word1 word2 word3 -word4%mm=50%
          

          Here you'd expect that two of of the three first words must match.

          q=word1 OR word2 word3%mm=100%
          Example after having indexed exampledocs:
          http://localhost:8983/solr/browse?q=ipod%20OR%20samsung%20printer&debugQuery=true&mm=100%25
          

          With ipod OR samsung I get 5 hits. Adding the word "printer" yields 6 hits, i.e. it is OR'ed too. Here I'd expect the equivalent of (word1 OR word2) AND word3.

          q=word1 AND word2 word3%mm=50%
          

          What would you expect for this? Perhaps (word1 AND word2) to be treated as clause1 and word3 as clause2 and then apply mm=1?

          q=word1 OR word2 word3 word4 word5%mm=50%
          

          How about this? Again, it would make sense to respect (word1 OR word2) as one clause and then require two clauses out of the resulting four.

          Show
          Jan Høydahl added a comment - So how should the parser interpret these examples? q=word1 word2 word3 -word4&mm=100% I agree with Ahmet that here both word1, word2 and word3 must be required since mm is explicitly specified. If mm is not specified, mm is set from defaultOperator, i.e. AND=>100%, OR=>0 q=word1 word2 word3 -word4%mm=50% Here you'd expect that two of of the three first words must match. q=word1 OR word2 word3%mm=100% Example after having indexed exampledocs: http://localhost:8983/solr/browse?q=ipod%20OR%20samsung%20printer&debugQuery=true&mm=100%25 With ipod OR samsung I get 5 hits. Adding the word "printer" yields 6 hits, i.e. it is OR'ed too. Here I'd expect the equivalent of (word1 OR word2) AND word3. q=word1 AND word2 word3%mm=50% What would you expect for this? Perhaps (word1 AND word2) to be treated as clause1 and word3 as clause2 and then apply mm=1? q=word1 OR word2 word3 word4 word5%mm=50% How about this? Again, it would make sense to respect (word1 OR word2) as one clause and then require two clauses out of the resulting four.
          Hide
          Ron Davies added a comment -

          A significant portion of our users (professional searchers) would never accept this behaviour so this issue is a blocker for us, i.e. prevents us us from using edismax (which we would very much like to do).

          Show
          Ron Davies added a comment - A significant portion of our users (professional searchers) would never accept this behaviour so this issue is a blocker for us, i.e. prevents us us from using edismax (which we would very much like to do).
          Hide
          Brian Carver added a comment -

          If this bug is responsible for the behavior Mike describes, then I agree with him that this should not be classed "minor" as it results in precisely the opposite behavior that the user/maintainer would anticipate.

          Show
          Brian Carver added a comment - If this bug is responsible for the behavior Mike describes, then I agree with him that this should not be classed "minor" as it results in precisely the opposite behavior that the user/maintainer would anticipate.
          Hide
          Mike added a comment -

          Yeah, I'm seeing this too. A user has reported that they queried:
          (internet OR online OR web) "personal jurisdiction"

          I have defaultOperator set to AND, so I'd expect the query to get processed as:
          (internet OR online OR web) AND "personal jurisdiction"

          But it is instead getting processed with an OR statement. I've confirmed this using debug.

          This doesn't seem like ideal functionality for the default operator to work, except when the user tries to override it in parts of a query. This seems like more than a minor issue to me.

          Show
          Mike added a comment - Yeah, I'm seeing this too. A user has reported that they queried: (internet OR online OR web) "personal jurisdiction" I have defaultOperator set to AND, so I'd expect the query to get processed as: (internet OR online OR web) AND "personal jurisdiction" But it is instead getting processed with an OR statement. I've confirmed this using debug. This doesn't seem like ideal functionality for the default operator to work, except when the user tries to override it in parts of a query. This seems like more than a minor issue to me.
          Hide
          Sean Daugherty added a comment -

          As far as I can tell, q.op is being ignored. In my case, it defaults to "OR"/MM0%. I'm not sure why it's doing that, but it's certainly not responding to either q.op or <solrQueryParser/>.

          Show
          Sean Daugherty added a comment - As far as I can tell, q.op is being ignored. In my case, it defaults to "OR"/MM0%. I'm not sure why it's doing that, but it's certainly not responding to either q.op or <solrQueryParser/>.
          Hide
          Hoss Man added a comment -

          I believe the intention here was that if a query string contains any query operators (AND/OR/NOT/+/-) then it's assumed the user wants exactly what they asked for, and the "mm" value should not be used.

          I believe in the cases where false==doMinMatched then the q.op (which defaults to <solrQueryParser defaultOperator="..."/> should come into play, so folks using mm=100%&q.op=AND or mm=0&q.op=OR should already get the behavior they expect (if it's not using q.op then that definitely seems like a bug)

          when people are using middle ground values for mm (ie: mm=50% etc...) then it definitely seems like we need some way for them to indicate to edismax thta the mm should always be used.

          Show
          Hoss Man added a comment - I believe the intention here was that if a query string contains any query operators (AND/OR/NOT/+/-) then it's assumed the user wants exactly what they asked for, and the "mm" value should not be used. I believe in the cases where false==doMinMatched then the q.op (which defaults to <solrQueryParser defaultOperator="..."/> should come into play, so folks using mm=100%&q.op=AND or mm=0&q.op=OR should already get the behavior they expect (if it's not using q.op then that definitely seems like a bug) when people are using middle ground values for mm (ie: mm=50% etc...) then it definitely seems like we need some way for them to indicate to edismax thta the mm should always be used.
          Hide
          Ahmet Arslan added a comment -

          I experienced the same issue. When i added one negative clause to the query string (that has two optional clauses), mm is ignored and default operator is used instead.
          q=word1 word2 -word3&mm=100%&defType=edismax
          and
          q=word1 word2 -word3&mm=100%&defType=dismax
          returns different result sets.

          edismax returns documents containing either word1 or word2, although there are two optional clauses in the query and mm is set to 100%.

          Show
          Ahmet Arslan added a comment - I experienced the same issue. When i added one negative clause to the query string (that has two optional clauses), mm is ignored and default operator is used instead. q=word1 word2 -word3&mm=100%&defType=edismax and q=word1 word2 -word3&mm=100%&defType=dismax returns different result sets. edismax returns documents containing either word1 or word2, although there are two optional clauses in the query and mm is set to 100%.

            People

            • Assignee:
              Unassigned
              Reporter:
              Magnus Bergmark
            • Votes:
              37 Vote for this issue
              Watchers:
              42 Start watching this issue

              Dates

              • Created:
                Updated:

                Development