Solr
  1. Solr
  2. SOLR-435

QParser must validate existence/absence of "q" parameter

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.3
    • Fix Version/s: 3.6, 4.0-ALPHA
    • Component/s: search
    • Labels:
      None

      Description

      Each QParser should check if "q" exists or not. For some it will be required others not.
      currently it throws a null pointer:

      java.lang.NullPointerException
      	at org.apache.solr.common.util.StrUtils.splitSmart(StrUtils.java:36)
      	at org.apache.solr.search.OldLuceneQParser.parse(LuceneQParserPlugin.java:104)
      	at org.apache.solr.search.QParser.getQuery(QParser.java:80)
      	at org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:67)
      	at org.apache.solr.handler.SearchHandler.handleRequestBody(SearchHandler.java:150)
              ...
      

      see:
      http://www.nabble.com/query-parsing-error-to14124285.html#a14140108

      1. SOLR-2001_3x_backport_with_empty_string_check_and_test.patch
        10 kB
        David Smiley
      2. SOLR-435_3x_consistent_errors.patch
        4 kB
        Hoss Man
      3. SOLR-435.patch
        11 kB
        Hoss Man
      4. SOLR-435_q_defaults_to_all-docs.patch
        6 kB
        David Smiley

        Issue Links

          Activity

          Hide
          Yonik Seeley added a comment -

          Or more correctly, verify that it's query string is non-null if it was expecting one and doesn't have a backup plan or default semantics.

          Show
          Yonik Seeley added a comment - Or more correctly, verify that it's query string is non-null if it was expecting one and doesn't have a backup plan or default semantics.
          Hide
          Shalin Shekhar Mangar added a comment -

          Committed revision 741046.

          I have committed Lars's patch at SOLR-525.

          Show
          Shalin Shekhar Mangar added a comment - Committed revision 741046. I have committed Lars's patch at SOLR-525 .
          Hide
          Shalin Shekhar Mangar added a comment -

          Reverting the commit and re-opening the issue.

          Show
          Shalin Shekhar Mangar added a comment - Reverting the commit and re-opening the issue.
          Hide
          Shalin Shekhar Mangar added a comment -

          Marked for 1.5

          Show
          Shalin Shekhar Mangar added a comment - Marked for 1.5
          Hide
          Hoss Man added a comment -

          Bulk updating 240 Solr issues to set the Fix Version to "next" per the process outlined in this email...

          http://mail-archives.apache.org/mod_mbox/lucene-dev/201005.mbox/%3Calpine.DEB.1.10.1005251052040.24672@radix.cryptio.net%3E

          Selection criteria was "Unresolved" with a Fix Version of 1.5, 1.6, 3.1, or 4.0. email notifications were suppressed.

          A unique token for finding these 240 issues in the future: hossversioncleanup20100527

          Show
          Hoss Man added a comment - Bulk updating 240 Solr issues to set the Fix Version to "next" per the process outlined in this email... http://mail-archives.apache.org/mod_mbox/lucene-dev/201005.mbox/%3Calpine.DEB.1.10.1005251052040.24672@radix.cryptio.net%3E Selection criteria was "Unresolved" with a Fix Version of 1.5, 1.6, 3.1, or 4.0. email notifications were suppressed. A unique token for finding these 240 issues in the future: hossversioncleanup20100527
          Hide
          Robert Muir added a comment -

          Bulk move 3.2 -> 3.3

          Show
          Robert Muir added a comment - Bulk move 3.2 -> 3.3
          Hide
          Robert Muir added a comment -

          3.4 -> 3.5

          Show
          Robert Muir added a comment - 3.4 -> 3.5
          Hide
          David Smiley added a comment - - edited

          I've always thought it was bad that Solr throws an error if 'q' is not specified, and it can be a "WTF" experience to new users unfamiliar with this. I am aware that dismax has q.alt but that's separate. What's wrong with assuming a match-all docs query of *:*? It's better than an error, IMO.

          FYI aside from seeing this from time to time in my own Solr work, the Ajax-Solr framework has a work-around to basically automatically supply q or q.alt with '*:*' in different circumstances, and it's quite a hack. You can argue the user should explicitly supply *:* but then if they forget, they get a Solr error.

          Show
          David Smiley added a comment - - edited I've always thought it was bad that Solr throws an error if 'q' is not specified, and it can be a "WTF" experience to new users unfamiliar with this. I am aware that dismax has q.alt but that's separate. What's wrong with assuming a match-all docs query of *:*? It's better than an error, IMO. FYI aside from seeing this from time to time in my own Solr work, the Ajax-Solr framework has a work-around to basically automatically supply q or q.alt with '*:*' in different circumstances, and it's quite a hack. You can argue the user should explicitly supply *:* but then if they forget, they get a Solr error.
          Hide
          Hoss Man added a comment -

          Issue is marked 3.6 and actively being discussed but has no assignee - assigning to most active committer contributing patches/discussion so far to triage wether this can/should be pushed to 4.0 or not.

          Show
          Hoss Man added a comment - Issue is marked 3.6 and actively being discussed but has no assignee - assigning to most active committer contributing patches/discussion so far to triage wether this can/should be pushed to 4.0 or not.
          Hide
          David Smiley added a comment -

          Attached is a patch that I'd like to apply on Monday end of day (EST). My changes.txt is as follows:

          • SOLR-435: For the lucene, dismax, and edismax query parsers: if no query string is supplied, or if its blank or just whitespace, then the default is to match all documents. It used to be an error. This basically means 'q' isn't required and q.alt=: is redundant. (dsmiley)

          I also discovered and fixed a bug in edismax in which a query of pure whitespace threw an error. dismax didn't do this. It was a trivial fix in which a trim() was forgotten.

          There are many ways to approach addressing this issue with different pros/cons; it was hard to ultimately settle on what I chose to do. I could have modified QueryComponent, or just QParser, or just the actual QParser subclasses. A universal choice couldn't be made for all qparsers because most qparsers don't even use the query string. Feedback please!

          Show
          David Smiley added a comment - Attached is a patch that I'd like to apply on Monday end of day (EST). My changes.txt is as follows: SOLR-435 : For the lucene, dismax, and edismax query parsers: if no query string is supplied, or if its blank or just whitespace, then the default is to match all documents. It used to be an error. This basically means 'q' isn't required and q.alt= : is redundant. (dsmiley) I also discovered and fixed a bug in edismax in which a query of pure whitespace threw an error. dismax didn't do this. It was a trivial fix in which a trim() was forgotten. There are many ways to approach addressing this issue with different pros/cons; it was hard to ultimately settle on what I chose to do. I could have modified QueryComponent, or just QParser, or just the actual QParser subclasses. A universal choice couldn't be made for all qparsers because most qparsers don't even use the query string. Feedback please!
          Hide
          Hoss Man added a comment -

          if no query string is supplied, or if its blank or just whitespace, then the default is to match all documents.

          -0 ... the risk with this approach is that (new) users who make typos in queries or are missinformed about the name "q" param (ie: /search?Q=foo or /search?query=foo) will be really confused when they query they specify is completely ignored w/o explanation and all docs are returned in it's place. I think it's much better to throw a clear error "q param is not specified" but i certainly see the value in adding q.alt support to the LuceneQParser with the same semantics as dismax (used if q is missing or all whitespace) .. not sure why we've never considered that before. (obviosly it wouldn't make sense for all QParsers, like "field" or "term" since all whitespace and or empty strings are totally valid input for them)

          I could have modified QueryComponent, or just QParser, or just the actual QParser subclasses. A universal choice couldn't be made for all qparsers...

          we definitely shouldn't modify QueryComponent ... the entire point of the issue is that QueryComponent can't attempt to validate the q param, because it doesn't know if/when the defType QParser requires it to exist – the individual QParsers all need to throw clear errors if they require it and it's not specified, that's really the whole reason this issue was opened in the first place

          Show
          Hoss Man added a comment - if no query string is supplied, or if its blank or just whitespace, then the default is to match all documents. -0 ... the risk with this approach is that (new) users who make typos in queries or are missinformed about the name "q" param (ie: /search?Q=foo or /search?query=foo ) will be really confused when they query they specify is completely ignored w/o explanation and all docs are returned in it's place. I think it's much better to throw a clear error "q param is not specified" but i certainly see the value in adding q.alt support to the LuceneQParser with the same semantics as dismax (used if q is missing or all whitespace) .. not sure why we've never considered that before. (obviosly it wouldn't make sense for all QParsers, like "field" or "term" since all whitespace and or empty strings are totally valid input for them) I could have modified QueryComponent, or just QParser, or just the actual QParser subclasses. A universal choice couldn't be made for all qparsers... we definitely shouldn't modify QueryComponent ... the entire point of the issue is that QueryComponent can't attempt to validate the q param, because it doesn't know if/when the defType QParser requires it to exist – the individual QParsers all need to throw clear errors if they require it and it's not specified, that's really the whole reason this issue was opened in the first place
          Hide
          Chris Male added a comment -

          we definitely shouldn't modify QueryComponent ... the entire point of the issue is that QueryComponent can't attempt to validate the q param, because it doesn't know if/when the defType QParser requires it to exist – the individual QParsers all need to throw clear errors if they require it and it's not specified, that's really the whole reason this issue was opened in the first place

          If the purpose of the QueryComponent is to be QParser agnostic and consequently unable to know if the 'q' parameter is even relevant, shouldn't it be up to the QParser to retrieve what it believes the query string to be from the request parameters? Currently QueryComponent assumes the 'q' parameter value is the query string, sets that as being the case in the ResponseBuilder and passes it down to the QParser.

          Show
          Chris Male added a comment - we definitely shouldn't modify QueryComponent ... the entire point of the issue is that QueryComponent can't attempt to validate the q param, because it doesn't know if/when the defType QParser requires it to exist – the individual QParsers all need to throw clear errors if they require it and it's not specified, that's really the whole reason this issue was opened in the first place If the purpose of the QueryComponent is to be QParser agnostic and consequently unable to know if the 'q' parameter is even relevant, shouldn't it be up to the QParser to retrieve what it believes the query string to be from the request parameters? Currently QueryComponent assumes the 'q' parameter value is the query string, sets that as being the case in the ResponseBuilder and passes it down to the QParser.
          Hide
          David Smiley added a comment -

          IMO the bear minimum change should be a clear error that 'q' is missing. Right now you get a NullPointerException – Bad!

          Another change that I've always thought should happen that would address this issue is /search in the example config being configured with defType=edismax with q.alt=*:*

          Chris, as both Hoss and I said, QueryComponent can't assume that the particular QParser subclass even needs the query string at all, such as when defType=term or something. Granted that's very unusual but still.

          Shall I just go ahead and commit the missing trim() bug fix to edismax, or create an issue for it?

          Show
          David Smiley added a comment - IMO the bear minimum change should be a clear error that 'q' is missing. Right now you get a NullPointerException – Bad! Another change that I've always thought should happen that would address this issue is /search in the example config being configured with defType=edismax with q.alt=*:* Chris, as both Hoss and I said, QueryComponent can't assume that the particular QParser subclass even needs the query string at all, such as when defType=term or something. Granted that's very unusual but still. Shall I just go ahead and commit the missing trim() bug fix to edismax, or create an issue for it?
          Hide
          Chris Male added a comment -

          Chris, as both Hoss and I said, QueryComponent can't assume that the particular QParser subclass even needs the query string at all, such as when defType=term or something. Granted that's very unusual but still.

          That's exactly what I'm saying. The QueryComponent is making that assumption today. It retrieves a query string from the 'q' param and passes it to the QParser. Why not let the QParser do this all itself? that way, if the QParser doesn't need a query string or gets it from somewhere else, it can choose what to do or what not to do.

          Show
          Chris Male added a comment - Chris, as both Hoss and I said, QueryComponent can't assume that the particular QParser subclass even needs the query string at all, such as when defType=term or something. Granted that's very unusual but still. That's exactly what I'm saying. The QueryComponent is making that assumption today. It retrieves a query string from the 'q' param and passes it to the QParser. Why not let the QParser do this all itself? that way, if the QParser doesn't need a query string or gets it from somewhere else, it can choose what to do or what not to do.
          Hide
          David Smiley added a comment -

          Chris: If you look at QueryComponent a little further down the line it has to use QParser again to pass in filter queries in 'fq' not 'q'.

          Show
          David Smiley added a comment - Chris: If you look at QueryComponent a little further down the line it has to use QParser again to pass in filter queries in 'fq' not 'q'.
          Hide
          Chris Male added a comment -

          You're absolutely right.

          I can see the advantages of having QParser's accept a generic String that they need to parse, whether it be the main query string or a filter query. But in that case I don't think QParser's should also be trying to find the query string themselves in certain cases (such as ExtendedDisMaxQParser trying q.alt as an alternative source of a query string). QParsers are not required to respect what the QueryComponent gives them, so you can specify whatever you like in 'q' or 'fq' and QParser can totally ignore it.

          It just feels a little messy and confusing because wherever the query string identification occurs is where the validation should occur too, right?

          Show
          Chris Male added a comment - You're absolutely right. I can see the advantages of having QParser's accept a generic String that they need to parse, whether it be the main query string or a filter query. But in that case I don't think QParser's should also be trying to find the query string themselves in certain cases (such as ExtendedDisMaxQParser trying q.alt as an alternative source of a query string). QParsers are not required to respect what the QueryComponent gives them, so you can specify whatever you like in 'q' or 'fq' and QParser can totally ignore it. It just feels a little messy and confusing because wherever the query string identification occurs is where the validation should occur too, right?
          Hide
          David Smiley added a comment -

          FYI I created an issue for this edismax bug in SOLR-3270 and committed it.

          Hoss: I don't agree with your reasoning on the developer-user typo-ing the 'q' parameter. If you mistype basically any parameter then clearly it is as if you didn't even specify that parameter and you get the default behavior of the parameter you were trying to type correctly but didn't. In this case, we want q to be match-all-docs by default. It's not uncommon for there to be no query – initial state or just filter queries.

          Show
          David Smiley added a comment - FYI I created an issue for this edismax bug in SOLR-3270 and committed it. Hoss: I don't agree with your reasoning on the developer-user typo-ing the 'q' parameter. If you mistype basically any parameter then clearly it is as if you didn't even specify that parameter and you get the default behavior of the parameter you were trying to type correctly but didn't. In this case, we want q to be match-all-docs by default. It's not uncommon for there to be no query – initial state or just filter queries.
          Hide
          Hoss Man added a comment -

          If the purpose of the QueryComponent is to be QParser agnostic and consequently unable to know if the 'q' parameter is even relevant, shouldn't it be up to the QParser to retrieve what it believes the query string to be from the request parameters?

          Sorry ... i chose my words carelessly and wound up saying almost the exact opposite of what i ment.

          What i should have said...

          • QueryComponent is responsible for determining the QParser to use for the main query and passing it the value of the q query-string param to the QParser.getParser(...) method
          • QParser.getParser passes that query-string on to whater QParserPlugin was selected as the "qstr" param to the createParser
          • The QParser that gets created by the createParser call should do whatever validation it needs to do (including a null check) in it's parse() method

          In answer to your questions...

          • QueryComponent can not do any validation of the q param, because it can't make any assumptions about what the defType QParser this are legal values – not even a null check, because in case of things like dismax nll is perfectly fine
          • QParsers (and QParserPlugins) can't be made responsible for fetching the q param because they don't know if/when they are being used to parse the main query param, vs fq params, vs some other nested subquery
          • by putting this kind of validation/error checking in the QParser.parse method, we ensure that it is used properly even when the QParser(s) are used for things like 'fq' params or in nested subqueries

          Hoss: I don't agree with your reasoning on the developer-user typo-ing the 'q' parameter. If you mistype basically any parameter then clearly it is as if you didn't even specify that parameter and you get the default behavior of the parameter you were trying to type correctly but didn't.

          understood ... but most other situations the "default" behavior is either "do nothing" or "error" ... we don't have a lot of default behaviors which are "give me tones of stuff" ... if you use facet=true&faceet.field=foo (note the extra character) you don't silently get get faceting on every field as a default – you get no field faceting at all. if you misstype the q param name and get an error on your first attempt you immediately understand you did something wrong. likewise if we made the default a "matches nothing" query, then you'd get no results and (hopefully) be suspicious enough to realize you made a mistake – but if we give you a bunch of results by default you may not realize at all that you're looking at all results not just the results of what you thought the query was. the only situations i can think of where forgetting or mistyping a param name doens't default to error or nothing are things with fixed expectations: start, rows, fl, etc... Those have defaults that (if they don't match what you tried to specify) are immediately obvious ... the 'start' attribute on the docList returned is wrong, you get more results then you expected, you get field names you know you didn't specify, etc... it's less obvious when you are looking at the results of a query that it's a match-all query instead of the query you thought you were specifying.

          like i said ... i'm -0 to having a hardcoded default query for lucene/dismax/edismax ... if you feel strongly about it that's fine, allthough i would try to convince you "match none" is a better hardcoded default then 'match all' (so that it's easier to recognize mistakes quickly) and really don't think we should do it w/o also add q.alt support to the LuceneQParser so people can override it.

          Show
          Hoss Man added a comment - If the purpose of the QueryComponent is to be QParser agnostic and consequently unable to know if the 'q' parameter is even relevant, shouldn't it be up to the QParser to retrieve what it believes the query string to be from the request parameters? Sorry ... i chose my words carelessly and wound up saying almost the exact opposite of what i ment. What i should have said... QueryComponent is responsible for determining the QParser to use for the main query and passing it the value of the q query-string param to the QParser.getParser(...) method QParser.getParser passes that query-string on to whater QParserPlugin was selected as the "qstr" param to the createParser The QParser that gets created by the createParser call should do whatever validation it needs to do (including a null check) in it's parse() method In answer to your questions... QueryComponent can not do any validation of the q param, because it can't make any assumptions about what the defType QParser this are legal values – not even a null check, because in case of things like dismax nll is perfectly fine QParsers (and QParserPlugins) can't be made responsible for fetching the q param because they don't know if/when they are being used to parse the main query param, vs fq params, vs some other nested subquery by putting this kind of validation/error checking in the QParser.parse method, we ensure that it is used properly even when the QParser(s) are used for things like 'fq' params or in nested subqueries Hoss: I don't agree with your reasoning on the developer-user typo-ing the 'q' parameter. If you mistype basically any parameter then clearly it is as if you didn't even specify that parameter and you get the default behavior of the parameter you were trying to type correctly but didn't. understood ... but most other situations the "default" behavior is either "do nothing" or "error" ... we don't have a lot of default behaviors which are "give me tones of stuff" ... if you use facet=true&faceet.field=foo (note the extra character) you don't silently get get faceting on every field as a default – you get no field faceting at all. if you misstype the q param name and get an error on your first attempt you immediately understand you did something wrong. likewise if we made the default a "matches nothing" query, then you'd get no results and (hopefully) be suspicious enough to realize you made a mistake – but if we give you a bunch of results by default you may not realize at all that you're looking at all results not just the results of what you thought the query was. the only situations i can think of where forgetting or mistyping a param name doens't default to error or nothing are things with fixed expectations: start, rows, fl, etc... Those have defaults that (if they don't match what you tried to specify) are immediately obvious ... the 'start' attribute on the docList returned is wrong, you get more results then you expected, you get field names you know you didn't specify, etc... it's less obvious when you are looking at the results of a query that it's a match-all query instead of the query you thought you were specifying. like i said ... i'm -0 to having a hardcoded default query for lucene/dismax/edismax ... if you feel strongly about it that's fine, allthough i would try to convince you "match none" is a better hardcoded default then 'match all' (so that it's easier to recognize mistakes quickly) and really don't think we should do it w/o also add q.alt support to the LuceneQParser so people can override it.
          Hide
          Ryan McKinley added a comment -

          if no query string is supplied, or if its blank or just whitespace, then the default is to match all documents.

          -0

          When I opened this issue (4 years ago!) I was only worried that you get a NPE from a missing 'q'

          don't think we should do it w/o also add q.alt support to the LuceneQParser so people can override it.

          +1

          Match none seems like the most appropriate behavior unless you explicitly say something else

          Show
          Ryan McKinley added a comment - if no query string is supplied, or if its blank or just whitespace, then the default is to match all documents. -0 When I opened this issue (4 years ago!) I was only worried that you get a NPE from a missing 'q' don't think we should do it w/o also add q.alt support to the LuceneQParser so people can override it. +1 Match none seems like the most appropriate behavior unless you explicitly say something else
          Hide
          David Smiley added a comment - - edited

          If match none were the default, then this would be an additional difference from a filter query aside from scoring implications. Think about it. If you don't add any filter queries, in effect there is no filtering, which is kinda like the default being *:* even though it strictly speaking doesn't work that way. Shouldn't 'q' work similarly?

          I'll commit a better error message later tonight:

              if (qstr == null || qstr.length() == 0)
                  throw new ParseException("missing query string");
          

          inside LuceneQParserPlugin.parse(). This is the same error message dismax & edismax use.

          v3.6 is soon and I don't feel like fighting this issue once the error message is improved.

          Show
          David Smiley added a comment - - edited If match none were the default, then this would be an additional difference from a filter query aside from scoring implications. Think about it. If you don't add any filter queries, in effect there is no filtering, which is kinda like the default being *:* even though it strictly speaking doesn't work that way. Shouldn't 'q' work similarly? I'll commit a better error message later tonight: if (qstr == null || qstr.length() == 0) throw new ParseException( "missing query string" ); inside LuceneQParserPlugin.parse(). This is the same error message dismax & edismax use. v3.6 is soon and I don't feel like fighting this issue once the error message is improved.
          Hide
          Chris Male added a comment -

          QParsers (and QParserPlugins) can't be made responsible for fetching the q param because they don't know if/when they are being used to parse the main query param, vs fq params, vs some other nested subquery

          I agree. I just have problems with the fact they kind of do this today. Both DisMaxQParser and ExtendedDisMaxParser check the q.alt parameter when the string given to them is null. So imagine if they're being used to parse a filter query but for whatever reason they are given a null string so they use the q.alt value instead, even though thats totally unrelated.

          by putting this kind of validation/error checking in the QParser.parse method, we ensure that it is used properly even when the QParser(s) are used for things like 'fq' params or in nested subqueries

          Again I agree. But I'm just not sure if that validation / error checking should involve checking alternative parameters. That feels like its defeating the goal of QParsers working in all situations.

          I just also wonder whether down the line we want better error messages here too. David's suggestion for "missing query string" aligns with other such messages (so I'm all for adding it). But I also wonder whether we could do better as well, by having context dependent messages saying what parameters were missing and how the user can address it. But that would require the QParser to know what its being used for which we don't want. Hence why I think this is really messy.

          I'll leave this discussion here and open another issue at some stage to address this.

          allthough i would try to convince you "match none" is a better hardcoded default then 'match all'

          I agree with Hoss. If no query was mentioned, nothing was matched.

          If match none were the default, then this would be an additional difference from a filter query aside from scoring implications

          I see them as very different things so I'm not bothered by the difference.

          Show
          Chris Male added a comment - QParsers (and QParserPlugins) can't be made responsible for fetching the q param because they don't know if/when they are being used to parse the main query param, vs fq params, vs some other nested subquery I agree. I just have problems with the fact they kind of do this today. Both DisMaxQParser and ExtendedDisMaxParser check the q.alt parameter when the string given to them is null. So imagine if they're being used to parse a filter query but for whatever reason they are given a null string so they use the q.alt value instead, even though thats totally unrelated. by putting this kind of validation/error checking in the QParser.parse method, we ensure that it is used properly even when the QParser(s) are used for things like 'fq' params or in nested subqueries Again I agree. But I'm just not sure if that validation / error checking should involve checking alternative parameters. That feels like its defeating the goal of QParsers working in all situations. I just also wonder whether down the line we want better error messages here too. David's suggestion for "missing query string" aligns with other such messages (so I'm all for adding it). But I also wonder whether we could do better as well, by having context dependent messages saying what parameters were missing and how the user can address it. But that would require the QParser to know what its being used for which we don't want. Hence why I think this is really messy. I'll leave this discussion here and open another issue at some stage to address this. allthough i would try to convince you "match none" is a better hardcoded default then 'match all' I agree with Hoss. If no query was mentioned, nothing was matched. If match none were the default, then this would be an additional difference from a filter query aside from scoring implications I see them as very different things so I'm not bothered by the difference.
          Hide
          Hoss Man added a comment -

          FYI: i started cranking out a test patch to try and verify that all OOTB QParsers were doing the necessary vaidation, and realized poking at the code that we already addressed a lot of these discussions back in SOLR-2001, it just didn't make it into javadocs.

          QueryCOmponent allows and expects QParsers to return null from their parse() method if they think the input was valid, but doesn't result in an actual query, and then QueryCOmponent implements a standard default behavior of "match nothing"

          Show
          Hoss Man added a comment - FYI: i started cranking out a test patch to try and verify that all OOTB QParsers were doing the necessary vaidation, and realized poking at the code that we already addressed a lot of these discussions back in SOLR-2001 , it just didn't make it into javadocs. QueryCOmponent allows and expects QParsers to return null from their parse() method if they think the input was valid, but doesn't result in an actual query, and then QueryCOmponent implements a standard default behavior of "match nothing"
          Hide
          Hoss Man added a comment -

          Again I agree. But I'm just not sure if that validation / error checking should involve checking alternative parameters. That feels like its defeating the goal of QParsers working in all situations.

          Not sure i see the problem, ... part of the advantage in how q.alt it's implemented now is that you can put things like...

           <str name="fq">!dismax q.alt=*:* v=$keywords}</str>
          

          ...into "appends" params in your solrconfig. by default nothing is filtered, but if the client provides a "keywords" param then it's used.

          I just also wonder whether down the line we want better error messages here too. David's suggestion for "missing query string" aligns with other such messages

          It wouldn't have to ... parse() can throw ParseExceptions and QueryCOmponent (or whatever delegated to the QParser can wrap that in a user error (QueryCOmponent already does that)

          Show
          Hoss Man added a comment - Again I agree. But I'm just not sure if that validation / error checking should involve checking alternative parameters. That feels like its defeating the goal of QParsers working in all situations. Not sure i see the problem, ... part of the advantage in how q.alt it's implemented now is that you can put things like... <str name="fq">!dismax q.alt=*:* v=$keywords}</str> ...into "appends" params in your solrconfig. by default nothing is filtered, but if the client provides a "keywords" param then it's used. I just also wonder whether down the line we want better error messages here too. David's suggestion for "missing query string" aligns with other such messages It wouldn't have to ... parse() can throw ParseExceptions and QueryCOmponent (or whatever delegated to the QParser can wrap that in a user error (QueryCOmponent already does that)
          Hide
          Chris Male added a comment -

          Not sure i see the problem, ... part of the advantage in how q.alt it's implemented now is that you can put things like...

          Yeah that is a cool way to to use q.alt. But that is only usable when a q.alt is specified in the local params of the QParser. We don't prevent users from specifying it as a normal param, and then the QParser can be looking up a parameter that is not intended to be for it. For safety, shouldn't we constrain such parameters to being local params only?

          It wouldn't have to ... parse() can throw ParseExceptions and QueryCOmponent (or whatever delegated to the QParser can wrap that in a user error (QueryCOmponent already does that)

          It doesn't really do that though. Take DisMaxQParser for example, if it can't find a query string through either its given string or through the q.alt param, it throws a SolrException which isn't caught by QueryComponent. So there isn't any context here except through the stack trace. At best the QueryComponent catches a ParseException but since it wraps both the parsing of the main query and filter queries into the same try-catch block, it just wraps the ParseException in a SolrException and throws it. Again, no context.

          Show
          Chris Male added a comment - Not sure i see the problem, ... part of the advantage in how q.alt it's implemented now is that you can put things like... Yeah that is a cool way to to use q.alt. But that is only usable when a q.alt is specified in the local params of the QParser. We don't prevent users from specifying it as a normal param, and then the QParser can be looking up a parameter that is not intended to be for it. For safety, shouldn't we constrain such parameters to being local params only? It wouldn't have to ... parse() can throw ParseExceptions and QueryCOmponent (or whatever delegated to the QParser can wrap that in a user error (QueryCOmponent already does that) It doesn't really do that though. Take DisMaxQParser for example, if it can't find a query string through either its given string or through the q.alt param, it throws a SolrException which isn't caught by QueryComponent. So there isn't any context here except through the stack trace. At best the QueryComponent catches a ParseException but since it wraps both the parsing of the main query and filter queries into the same try-catch block, it just wraps the ParseException in a SolrException and throws it. Again, no context.
          Hide
          Hoss Man added a comment -

          It doesn't really do that though. Take DisMaxQParser for example, if it can't find a query string through either its given string or through the q.alt param, it throws a SolrException

          Sorry ... i was looking at my patched version – agreed it should throw ParseException

          At best the QueryComponent catches a ParseException but since it wraps both the parsing of the main query and filter queries into the same try-catch block, it just wraps the ParseException in a SolrException and throws it. Again, no context.

          ...that can be fixed though, it doesn't cahnge the question of semantics for the QParsers themselves

          Here's the patch i put together, the attempted goal was:

          1) add javadocs to QParser.parse documenting the (current) expectations
          2) automated test verifying that al built-in QParsers meet the expectations when given null or empty string
          3) standardizes lucene, dismax, and edismax to treat empty query strings as valid representations of matching nothing
          4) fixes all the other special case QParsers to throw a ParseError on null or empty string based on what makes sense (ie: things like TermQParser should allow "" as input, but things like FunctionQParser should not)

          ...however the test is currently broken, because it doesn't account for the fact that some of the special case QParsers have other required params – it does however nicely demonstrate other ways these Qparsers do a bad job of reporting errors.

          Given te level of debate in this issue, and the differences in opinion on what the right / wrong way to address this broader topic of query parsing validation i suggest we punt on this for Solr 3.6 – I'd rather leave things the way they are then make a bad decision in a hurry

          Show
          Hoss Man added a comment - It doesn't really do that though. Take DisMaxQParser for example, if it can't find a query string through either its given string or through the q.alt param, it throws a SolrException Sorry ... i was looking at my patched version – agreed it should throw ParseException At best the QueryComponent catches a ParseException but since it wraps both the parsing of the main query and filter queries into the same try-catch block, it just wraps the ParseException in a SolrException and throws it. Again, no context. ...that can be fixed though, it doesn't cahnge the question of semantics for the QParsers themselves — Here's the patch i put together, the attempted goal was: 1) add javadocs to QParser.parse documenting the (current) expectations 2) automated test verifying that al built-in QParsers meet the expectations when given null or empty string 3) standardizes lucene, dismax, and edismax to treat empty query strings as valid representations of matching nothing 4) fixes all the other special case QParsers to throw a ParseError on null or empty string based on what makes sense (ie: things like TermQParser should allow "" as input, but things like FunctionQParser should not) ...however the test is currently broken, because it doesn't account for the fact that some of the special case QParsers have other required params – it does however nicely demonstrate other ways these Qparsers do a bad job of reporting errors. Given te level of debate in this issue, and the differences in opinion on what the right / wrong way to address this broader topic of query parsing validation i suggest we punt on this for Solr 3.6 – I'd rather leave things the way they are then make a bad decision in a hurry
          Hide
          Chris Male added a comment -

          Sorry ... i was looking at my patched version – agreed it should throw ParseException

          No worries. I think we should ensure that all QParsers only use ParseException (I see that ExtendedDisMaxQParser uses a c&p of the logic from DisMaxQParser). I see that's what your patch does.

          Given te level of debate in this issue, and the differences in opinion on what the right / wrong way to address this broader topic of query parsing validation i suggest we punt on this for Solr 3.6 – I'd rather leave things the way they are then make a bad decision in a hurry

          I agree but I also think we should commit the improved error message suggested by David so that we avoid the unhelpful NPE. Any broader changes will be in 4.0 so we don't have a backwards compat problem.

          Show
          Chris Male added a comment - Sorry ... i was looking at my patched version – agreed it should throw ParseException No worries. I think we should ensure that all QParsers only use ParseException (I see that ExtendedDisMaxQParser uses a c&p of the logic from DisMaxQParser). I see that's what your patch does. Given te level of debate in this issue, and the differences in opinion on what the right / wrong way to address this broader topic of query parsing validation i suggest we punt on this for Solr 3.6 – I'd rather leave things the way they are then make a bad decision in a hurry I agree but I also think we should commit the improved error message suggested by David so that we avoid the unhelpful NPE. Any broader changes will be in 4.0 so we don't have a backwards compat problem.
          Hide
          Hoss Man added a comment -

          I agree but I also think we should commit the improved error message suggested by David so that we avoid the unhelpful NPE. Any broader changes will be in 4.0 so we don't have a backwards compat problem.

          Grrr... yes, i see ... SOLR-2001 is only on trunk, somehow i overlooked that and it contributed to my confusion as to some of the comments in this issue.

          So instead of NPEs or what not that you get in 3.5 from various parsers, we switch to consistent'new ParseException("missing query string");' in 3.6, and address if there can be better default handling in 4.0 (continuing what SOLR-2001 started)

          +1

          Show
          Hoss Man added a comment - I agree but I also think we should commit the improved error message suggested by David so that we avoid the unhelpful NPE. Any broader changes will be in 4.0 so we don't have a backwards compat problem. Grrr... yes, i see ... SOLR-2001 is only on trunk, somehow i overlooked that and it contributed to my confusion as to some of the comments in this issue. So instead of NPEs or what not that you get in 3.5 from various parsers, we switch to consistent'new ParseException("missing query string");' in 3.6, and address if there can be better default handling in 4.0 (continuing what SOLR-2001 started) +1
          Hide
          Hoss Man added a comment -

          patch against 3x that ensures the parsers likely to be used as defaults (lucene, lucenePlusSort, dismax, and edismax) give consistent ParseExceptions when the query sting is null or missing.

          this seems to be along the lines of what smiley and male where suggesting for 3.6, and then we could open a new issue to re-evaluate if/where the error handling/reporting of QParsers in general is lacking for 4.0 or if there should be different defaults (given that SOLR-2001 already made changes relate this this dicussion on trunk)

          Show
          Hoss Man added a comment - patch against 3x that ensures the parsers likely to be used as defaults (lucene, lucenePlusSort, dismax, and edismax) give consistent ParseExceptions when the query sting is null or missing. this seems to be along the lines of what smiley and male where suggesting for 3.6, and then we could open a new issue to re-evaluate if/where the error handling/reporting of QParsers in general is lacking for 4.0 or if there should be different defaults (given that SOLR-2001 already made changes relate this this dicussion on trunk)
          Hide
          David Smiley added a comment -

          Looks nice Hoss. There's a typo in your test javadoc: "change in SOlr 4.0". I did indeed notice the needless use of SolrException instead of ParseException and I think it that was in my patch.

          RE SOLR-2001 wow, yet another issue about the same problem – that makes 3. Clearly it's a problem! Why not simply apply the SOLR-2001 patch for consistent behavior?

          Show
          David Smiley added a comment - Looks nice Hoss. There's a typo in your test javadoc: "change in SOlr 4.0". I did indeed notice the needless use of SolrException instead of ParseException and I think it that was in my patch. RE SOLR-2001 wow, yet another issue about the same problem – that makes 3. Clearly it's a problem! Why not simply apply the SOLR-2001 patch for consistent behavior?
          Hide
          Hoss Man added a comment -

          Why not simply apply the SOLR-2001 patch for consistent behavior?

          good question ... if you're cool with that then it seems okay to me (although off the top of my head i think when i was looking at trunk one of those 4 "main" parsers still needed "fixed" to return null).

          in general my suggestion for 3.6 was largely based on the fact that there was still active discussion about what the best long term behavior was, which might contradict what was discussed in SOLR-2001, so better to play it safe and just clean up the error reporting: "I'd rather leave things the way they are then make a bad decision in a hurry"

          if you want to backport SOLR-2001 and sanity check that lucene/dismax/edismax/lucenePlusSort all return null on null/blank query strings i'm +1 to that (that seems consistent with what ryan/male/me were advocating here, so as long as your on board i think we're good)

          Show
          Hoss Man added a comment - Why not simply apply the SOLR-2001 patch for consistent behavior? good question ... if you're cool with that then it seems okay to me (although off the top of my head i think when i was looking at trunk one of those 4 "main" parsers still needed "fixed" to return null). in general my suggestion for 3.6 was largely based on the fact that there was still active discussion about what the best long term behavior was, which might contradict what was discussed in SOLR-2001 , so better to play it safe and just clean up the error reporting: "I'd rather leave things the way they are then make a bad decision in a hurry" if you want to backport SOLR-2001 and sanity check that lucene/dismax/edismax/lucenePlusSort all return null on null/blank query strings i'm +1 to that (that seems consistent with what ryan/male/me were advocating here, so as long as your on board i think we're good)
          Hide
          David Smiley added a comment -

          I'm onboard with SOLR-2001. My 1st concern was the NPE, my 2nd was a non-error default, my 3rd was :. I can't have everything

          I did consider the approach in SOLR-2001 which is to return null from QParser.parse() but I wasn't sure if that would be disruptive to any implied contract/expectation. I guess there's no issue with that.

          I'll try and attach a back-ported patch soon.

          Show
          David Smiley added a comment - I'm onboard with SOLR-2001 . My 1st concern was the NPE, my 2nd was a non-error default, my 3rd was : . I can't have everything I did consider the approach in SOLR-2001 which is to return null from QParser.parse() but I wasn't sure if that would be disruptive to any implied contract/expectation. I guess there's no issue with that. I'll try and attach a back-ported patch soon.
          Hide
          David Smiley added a comment -

          Attached is my 3x port of SOLR-2001. Modifications:

          • I added your test Hoss
          • Added empty string check
          • Added a Javadoc on QParser.parse() to indicate null may be returned
          • Added support for DisMaxQParser

          I'd like to commit this tonight to 3x, and commit the differences it has with trunk to trunk.

          Is a CHANGES.txt necessary?

          Show
          David Smiley added a comment - Attached is my 3x port of SOLR-2001 . Modifications: I added your test Hoss Added empty string check Added a Javadoc on QParser.parse() to indicate null may be returned Added support for DisMaxQParser I'd like to commit this tonight to 3x, and commit the differences it has with trunk to trunk. Is a CHANGES.txt necessary?
          Hide
          David Smiley added a comment -

          I committed the patch to 3x and made the small improvements to 4x.

          Show
          David Smiley added a comment - I committed the patch to 3x and made the small improvements to 4x.
          Hide
          Mark Miller added a comment -

          Reverted on 4x

          Show
          Mark Miller added a comment - Reverted on 4x
          Hide
          David Smiley added a comment -

          (This patch didn't break the build.)

          Since this is a back-port, what do I do with the SOLR-2001 entry in the CHANGES.txt in trunk? I suspect I should move the issue to the 3.6 section of the file, and commit that to the CHANGES.txt on both branches.

          For reference:

          * SOLR-2001: The query component will substitute an empty query that matches
            no documents if the query parser returns null.  This also prevents an
            exception from being thrown by the default parser if "q" is missing. (yonik)
            SOLR-435: if q is "" then it's also acceptable. (dsmiley, hoss)
          
          Show
          David Smiley added a comment - (This patch didn't break the build.) Since this is a back-port, what do I do with the SOLR-2001 entry in the CHANGES.txt in trunk? I suspect I should move the issue to the 3.6 section of the file, and commit that to the CHANGES.txt on both branches. For reference: * SOLR-2001: The query component will substitute an empty query that matches no documents if the query parser returns null. This also prevents an exception from being thrown by the default parser if "q" is missing. (yonik) SOLR-435: if q is "" then it's also acceptable. (dsmiley, hoss)
          Hide
          David Smiley added a comment -

          Re-committed to 4.x, and I moved the CHANGES.txt from the v4 to v3 section on both branches. Closing issue.

          Show
          David Smiley added a comment - Re-committed to 4.x, and I moved the CHANGES.txt from the v4 to v3 section on both branches. Closing issue.

            People

            • Assignee:
              David Smiley
              Reporter:
              Ryan McKinley
            • Votes:
              1 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development