Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-4646

[edismax] let lowercaseOperators default to "false"

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Trivial
    • Resolution: Fixed
    • Affects Version/s: 4.1, 4.2
    • Fix Version/s: 7.0
    • Component/s: query parsers
    • Labels:
      None

      Description

      Documentation says:
      lowercaseOperators
      This param controls whether to try to interpret lowercase words as boolean operators such as "and", "not" and "or". Set &lowercaseOperators=true to allow this. Default is "false".

      But in fact lowercaseOperators=true by default.
      And if one of boolean operators in lowercase is present in query it turns off mm parameter:

      • q=Young+6+or+Ariston&defType=edismax&qf=name&mm=100%25&debugQuery=true
        "parsedquery_toString": "+((name:young) (name:6) (name:ariston))"
      • q=Young+6+or+Ariston&defType=edismax&qf=name&mm=100%25&lowercaseOperators=false&debugQuery=true
        "parsedquery_toString": "+(((name:young) (name:6) (name:ariston))~3)"
      1. SOLR-4646.patch
        5 kB
        Jan Høydahl
      2. SOLR-4646.patch
        383 kB
        Jan Høydahl
      3. SOLR-4646.patch
        5 kB
        Jan Høydahl
      4. SOLR-4646.patch
        4 kB
        Jan Høydahl

        Issue Links

          Activity

          Hide
          tomasflobbe Tomás Fernández Löbbe added a comment -

          This is correct, I just updated the documentation.

          Show
          tomasflobbe Tomás Fernández Löbbe added a comment - This is correct, I just updated the documentation.
          Hide
          tomasflobbe Tomás Fernández Löbbe added a comment -

          Validated that lowercaseOperators default is "true" in 4.1, 4.2 and trunk. Updated documentation.

          Show
          tomasflobbe Tomás Fernández Löbbe added a comment - Validated that lowercaseOperators default is "true" in 4.1, 4.2 and trunk. Updated documentation.
          Hide
          janhoy Jan Høydahl added a comment -

          Wait a minute - I'd argue that the default here should be false and we should change the code?

          An example is that in Norwegian language, the word "and" means "duck", the word "or" means "alder" and "not" means "seine". The same may be true for many other languages, and requiring uppercase operators by default makes perfect sense.

          Show
          janhoy Jan Høydahl added a comment - Wait a minute - I'd argue that the default here should be false and we should change the code? An example is that in Norwegian language, the word "and" means "duck", the word "or" means "alder" and "not" means "seine". The same may be true for many other languages, and requiring uppercase operators by default makes perfect sense.
          Hide
          janhoy Jan Høydahl added a comment -

          Reopening awaiting further discussion

          Show
          janhoy Jan Høydahl added a comment - Reopening awaiting further discussion
          Hide
          janhoy Jan Høydahl added a comment -

          Tested this with Solr3.5 and it seems the default has been "true" all the way. Although I have not hit concrete problems due to this I always thought it defaulted to false.

          Anyone else in favor of changing the default to "false" from v4.3? We could check luceneMatchVersion to keep old default for pre-4.3 solrconfigs...

          Show
          janhoy Jan Høydahl added a comment - Tested this with Solr3.5 and it seems the default has been "true" all the way. Although I have not hit concrete problems due to this I always thought it defaulted to false. Anyone else in favor of changing the default to "false" from v4.3? We could check luceneMatchVersion to keep old default for pre-4.3 solrconfigs...
          Hide
          tomasflobbe Tomás Fernández Löbbe added a comment -

          I think the documentation was wrong because it says that Solr does something when in reality it does a different thing, and it's not due to a bug. That's why I think the fix was to change the docs to reflects what Solr really does.
          That said, I'm OK with changing the default for future versions, your comment on Norwegian language makes sense to me. If we do so, we should make sure that the documentation clearly says that up to 4.2 you get one behavior, and after that you get a different one.

          Show
          tomasflobbe Tomás Fernández Löbbe added a comment - I think the documentation was wrong because it says that Solr does something when in reality it does a different thing, and it's not due to a bug. That's why I think the fix was to change the docs to reflects what Solr really does. That said, I'm OK with changing the default for future versions, your comment on Norwegian language makes sense to me. If we do so, we should make sure that the documentation clearly says that up to 4.2 you get one behavior, and after that you get a different one.
          Hide
          jkrupan Jack Krupansky added a comment -

          up to 4.2 you get one behavior, and after that you get a different one

          Hmmm... I would think that as a general proposition, intentional, incompatible behavior changes should be limited to major releases of trunk, and not to dot releases, where the expectation is that existing apps should still "just work" unless the issue is clearly a bug. In this case, sure, some people consider it a bug, but the reality is that they just don't like the default for some cases.

          So, if you want to propose this "improvement" for trunk, 5.0, fine. But I don't think it is appropriate to change the rules of the road in a dot release - unless it is a compelling, global problem - which it doesn't appear to be. I mean, the subset of apps that do have an issue with this have a very simple workaround (set lowercaseOperators to false in "defaults" in their query request handler.)

          And to be clear, support for lower case operators is an intentional feature. Granted, not everyone agrees, now.

          Show
          jkrupan Jack Krupansky added a comment - up to 4.2 you get one behavior, and after that you get a different one Hmmm... I would think that as a general proposition, intentional, incompatible behavior changes should be limited to major releases of trunk, and not to dot releases, where the expectation is that existing apps should still "just work" unless the issue is clearly a bug. In this case, sure, some people consider it a bug, but the reality is that they just don't like the default for some cases. So, if you want to propose this "improvement" for trunk, 5.0, fine. But I don't think it is appropriate to change the rules of the road in a dot release - unless it is a compelling, global problem - which it doesn't appear to be. I mean, the subset of apps that do have an issue with this have a very simple workaround (set lowercaseOperators to false in "defaults" in their query request handler.) And to be clear, support for lower case operators is an intentional feature. Granted, not everyone agrees, now.
          Hide
          janhoy Jan Høydahl added a comment -

          That's what luceneMatchVersion is for - if you upgrade solr.war without bumbing version in solrconfig, you get the "old" behavior. If you start from scratch or migrate your config bumping luceneMatchVersion, you do a conscious choice, and can adjust according to CHANGES.TXT.

          Thus if the community feels this is a bad default we should not be afraid to rectify that now. Google.com enforces uppercase operators - probably because of user confusion and wrong results with lowercase - so why should we set the bar lower?

          +1 to change the default from true to false (don't care which version, 4.3 or 5.0)

          This issue also touches SOLR-3580 which proposes to replace the lowercaseOperators param with a more flexible variant.

          Show
          janhoy Jan Høydahl added a comment - That's what luceneMatchVersion is for - if you upgrade solr.war without bumbing version in solrconfig, you get the "old" behavior. If you start from scratch or migrate your config bumping luceneMatchVersion, you do a conscious choice, and can adjust according to CHANGES.TXT. Thus if the community feels this is a bad default we should not be afraid to rectify that now. Google.com enforces uppercase operators - probably because of user confusion and wrong results with lowercase - so why should we set the bar lower? +1 to change the default from true to false (don't care which version, 4.3 or 5.0) This issue also touches SOLR-3580 which proposes to replace the lowercaseOperators param with a more flexible variant.
          Hide
          markrmiller@gmail.com Mark Miller added a comment -

          +1 to change the default from true to false

          +1 - might as well do it in 4.3. I think it's fairly clear that false is a better default. It's great that we offer both.

          Show
          markrmiller@gmail.com Mark Miller added a comment - +1 to change the default from true to false +1 - might as well do it in 4.3. I think it's fairly clear that false is a better default. It's great that we offer both.
          Hide
          dsmiley David Smiley added a comment -

          +1 to change the default from true to false

          +1

          Show
          dsmiley David Smiley added a comment - +1 to change the default from true to false +1
          Hide
          jkrupan Jack Krupansky added a comment -

          -1 but... I won't strenuously resist the proposed change.

          In some sense, it is a reasonable change - edismax is closer to the traditional Solr query parser.

          But, in a larger sense, the original goal od edismax was to try to establish a new and higher bar of usability - the idea that you have to use upper case for keywords is a holdover from the crude old days of search. Sure, we have the option to disable this "improvement", but somehow the cultural knowledge of the motivation for that improvement seems to have gotten lost, and now people are retreating to the barbaric past of primitive search.

          Also, the discussion has failed to note that the pf2 and pf3 parameters will actually include the operator keywords for phrase boosting. For example, a query for "in and out or not" (no quotes) will do boosting for "in and out", "and out or", and "out or not", so relevance will be quite decent - even or Norwegion text!

          So, I don't see what significant downside the proposed change would fix, at the cost of defaulting to a more old-fashioned approach to search.

          Show
          jkrupan Jack Krupansky added a comment - -1 but... I won't strenuously resist the proposed change. In some sense, it is a reasonable change - edismax is closer to the traditional Solr query parser. But, in a larger sense, the original goal od edismax was to try to establish a new and higher bar of usability - the idea that you have to use upper case for keywords is a holdover from the crude old days of search. Sure, we have the option to disable this "improvement", but somehow the cultural knowledge of the motivation for that improvement seems to have gotten lost, and now people are retreating to the barbaric past of primitive search. Also, the discussion has failed to note that the pf2 and pf3 parameters will actually include the operator keywords for phrase boosting. For example, a query for "in and out or not" (no quotes) will do boosting for "in and out", "and out or", and "out or not", so relevance will be quite decent - even or Norwegion text! So, I don't see what significant downside the proposed change would fix, at the cost of defaulting to a more old-fashioned approach to search.
          Hide
          yseeley@gmail.com Yonik Seeley added a comment -

          But, in a larger sense, the original goal od edismax was to try to establish a new and higher bar of usability

          Right - lowercase "and" and "or" were meant to be a natural language improvement (while not being that bad when used as literals instead of operators). The current behavior is certainly not a bug and Tomas had the correct doc fix.

          Show
          yseeley@gmail.com Yonik Seeley added a comment - But, in a larger sense, the original goal od edismax was to try to establish a new and higher bar of usability Right - lowercase "and" and "or" were meant to be a natural language improvement (while not being that bad when used as literals instead of operators). The current behavior is certainly not a bug and Tomas had the correct doc fix.
          Hide
          markrmiller@gmail.com Mark Miller added a comment -

          Right - lowercase "and" and "or" were meant to be a natural language improvement (while not being that bad when used as literals instead of operators).

          That's why its a nice option. The better default allows you to be explicit in this case. Guessing operators makes a much better option.

          Show
          markrmiller@gmail.com Mark Miller added a comment - Right - lowercase "and" and "or" were meant to be a natural language improvement (while not being that bad when used as literals instead of operators). That's why its a nice option. The better default allows you to be explicit in this case. Guessing operators makes a much better option.
          Hide
          dsmiley David Smiley added a comment -

          This is an old conversation but I hit this today (from real user queries) and thought I'd offer my opinion. Allowing lowercase operators, as it's currently implemented in edismax is trappy. A user might type "foo bar or baz", and based on how edismax is implemented, "foo" will always be BooleanClause.Occur.SHOULD ('mm' is ignored, 'q.op' is ignored). I'd feel better about it if in this mode, the default operator was set to AND. Nonetheless I think this feature is trappy; users don't necessarily know this syntax and it's implications.

          Show
          dsmiley David Smiley added a comment - This is an old conversation but I hit this today (from real user queries) and thought I'd offer my opinion. Allowing lowercase operators, as it's currently implemented in edismax is trappy. A user might type "foo bar or baz", and based on how edismax is implemented, "foo" will always be BooleanClause.Occur.SHOULD ('mm' is ignored, 'q.op' is ignored). I'd feel better about it if in this mode, the default operator was set to AND. Nonetheless I think this feature is trappy; users don't necessarily know this syntax and it's implications.
          Hide
          elyograg Shawn Heisey added a comment -

          The comment from David Smiley talks about implementation. My opinion below is about the default setting.

          I think the default should be false. There are plenty of situations where the word that gets turned into an operator is better suited as part of the query. For example, "q=peaches and cream" is an example where "and" as a term can be extremely important to obtaining relevant results. Depending on the content in the index, it can be a very different query than "q=peaches AND cream". With the current default, you can only achieve the latter query, unless you do tricks like stick an escape in the operator:

          q=peaches an\d cream
          

          Lowercase operators is a feature that many people want, so it's a good thing that we have this setting. In my opinion, enabling it by default is the wrong thing to do.

          Show
          elyograg Shawn Heisey added a comment - The comment from David Smiley talks about implementation. My opinion below is about the default setting. I think the default should be false. There are plenty of situations where the word that gets turned into an operator is better suited as part of the query. For example, "q=peaches and cream" is an example where "and" as a term can be extremely important to obtaining relevant results. Depending on the content in the index, it can be a very different query than "q=peaches AND cream". With the current default, you can only achieve the latter query, unless you do tricks like stick an escape in the operator: q=peaches an\d cream Lowercase operators is a feature that many people want, so it's a good thing that we have this setting. In my opinion, enabling it by default is the wrong thing to do.
          Hide
          janhoy Jan Høydahl added a comment -

          lowercase "and" and "or" were meant to be a natural language improvement

          Setting a natural language "improvement" as default when it only works for english language content seems odd. For content in any other language, it won't work since we use other words for and/or, and instead you may get weird behavior. That is a bit too much magic for a default setting, so I still support a change to false as default.

          Show
          janhoy Jan Høydahl added a comment - lowercase "and" and "or" were meant to be a natural language improvement Setting a natural language "improvement" as default when it only works for english language content seems odd. For content in any other language, it won't work since we use other words for and/or, and instead you may get weird behavior. That is a bit too much magic for a default setting, so I still support a change to false as default.
          Hide
          janhoy Jan Høydahl added a comment -

          Reviving this. The 7.0.0 release is an excellent timing to change the lowercaseOperators default from true to false! We'll only commit it to master.

          Show
          janhoy Jan Høydahl added a comment - Reviving this. The 7.0.0 release is an excellent timing to change the lowercaseOperators default from true to false! We'll only commit it to master.
          Hide
          dsmiley David Smiley added a comment -

          +1 Jan Høydahl! Thanks for taking care of it.

          Show
          dsmiley David Smiley added a comment - +1 Jan Høydahl ! Thanks for taking care of it.
          Hide
          janhoy Jan Høydahl added a comment -

          Patch

          Show
          janhoy Jan Høydahl added a comment - Patch
          Hide
          janhoy Jan Høydahl added a comment -

          New patch that makes the default depend on luceneMatchVersion. I.e. we'll respect the old default if people bring over their configs without explicitly bumping version.

          Show
          janhoy Jan Høydahl added a comment - New patch that makes the default depend on luceneMatchVersion. I.e. we'll respect the old default if people bring over their configs without explicitly bumping version.
          Hide
          dsmiley David Smiley added a comment -

          +1 patch looks good.

          Show
          dsmiley David Smiley added a comment - +1 patch looks good.
          Hide
          janhoy Jan Høydahl added a comment -

          Thanks for the review David!

          Oops, just saw that I switched true and false in "upgrading" section of CHANGES.txt. New patch attached.

          I'll commit this in a few days.

          Show
          janhoy Jan Høydahl added a comment - Thanks for the review David! Oops, just saw that I switched true and false in "upgrading" section of CHANGES.txt. New patch attached. I'll commit this in a few days.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 8648b005e28d24b7e18726502ae1ea5fffa44a5c in lucene-solr's branch refs/heads/master from Jan Høydahl
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=8648b00 ]

          SOLR-4646: eDismax lowercaseOperators now defaults to "false" for luceneMatchVersion >= 7.0.0

          Show
          jira-bot ASF subversion and git services added a comment - Commit 8648b005e28d24b7e18726502ae1ea5fffa44a5c in lucene-solr's branch refs/heads/master from Jan Høydahl [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=8648b00 ] SOLR-4646 : eDismax lowercaseOperators now defaults to "false" for luceneMatchVersion >= 7.0.0
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 3bb8939afec6b6e26689de580ddc5311bd5f0680 in lucene-solr's branch refs/heads/master from Jan Høydahl
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=3bb8939 ]

          SOLR-4646: Add the word "optionally" to refGuide:

          • optionally treats "and" and "or" as "AND" and "OR" in Lucene syntax mode.
          Show
          jira-bot ASF subversion and git services added a comment - Commit 3bb8939afec6b6e26689de580ddc5311bd5f0680 in lucene-solr's branch refs/heads/master from Jan Høydahl [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=3bb8939 ] SOLR-4646 : Add the word "optionally" to refGuide: optionally treats "and" and "or" as "AND" and "OR" in Lucene syntax mode.

            People

            • Assignee:
              janhoy Jan Høydahl
              Reporter:
              anti_social Alexander Koval
            • Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development