Solr
  1. Solr
  2. SOLR-2996

make "q=*" not suck in the lucene and edismax parsers

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.2, 5.0
    • Component/s: query parsers
    • Labels:
      None

      Description

      More then a few users have gotten burned by thinking that "*" is the appropriate syntax for "match all docs" when what it really does (unless i'm mistaken) is create a prefix query on the default search field using a blank string as the prefix.

      since it seems very unlikely that anyone has a genuine usecase for making a prefix query with a blank prefix, we should change the default behavior of the LuceneQParser and EDismaxQParsers (and any other Qparsers that respect *:* if i'm forgetting them) to treat this situation the same as *:*. we can offer a (local)param to force the old behavior if someone really wants it.

      1. SOLR-2996.patch
        4 kB
        Yonik Seeley

        Issue Links

          Activity

          Hide
          Hoss Man added a comment -

          Recent example of this type of confusion and the problems it can cause from the mailing list...

          https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201112.mbox/%3Calpine.DEB.2.00.1112131115550.16571@bester%3E

          Another recent discussion about this type of problem from IRC...

          13:25 < mikeliss:#solr> Hi, I'm running into an error with maxbooleanclauses when I try to do a range query with 
                                  highlighting...is there any workaround for this? Would really appreciate some direction, if 
                                  anybody knows.
          13:26 < mikeliss:#solr> This is the query that dies: 
          http://localhost:8983/solr/select/?q=*&version=2.2&start=0&rows=20&indent=on&hl=true&hl.fl=text,caseName,westCite,docketNumber,lexisCite,court_citation_string&hl.snippets=5&f.text.hl.alternateField=text&f.text.hl.maxAlternateFieldLength=500
          
          13:28 < hoss:#solr> that query doesn't make sense ... for a couple of reasons ... what are you *trying* to do?
          13:29 < hoss:#solr> i mena ... for starters ... there is no range query there.  second, q=* is a big red flag: it's 
                              a prefix query on the default field using the prefix "" (ie: the empty string)
          
          14:23 < mikeliss:#solr> hoss, yeah, I assumed that highlighting would just do nothing if a prefix query were given 
                                  on an empty string.
          14:24 < mikeliss:#solr> hoss, I added a check in my code that will only enable highlighting if the query isn't '*'.
          14:24 < mikeliss:#solr> hoss, Seems naive, but it's working at least for the moment.
          
          14:27 < hoss:#solr> i think you're missing my point: q=* is a fairly non-sensical query ... you should't just 
                              prevent highlighting on that query, you should stop doing that query in the first place
          14:28 < hoss:#solr> as a query solr can handle it, and optimize it to be efficient
          14:28 < hoss:#solr> (evenn though it's silly)
          14:28 < mikeliss:#solr> hoss, I'm using that query on my homepage to show the latest documents in the index. It 
                                  should just return everything, right?
          14:28 < hoss:#solr> but for highlighting, the highlighter actually needs to know all the terms it matches
          14:28 < hoss:#solr> and to konw al lthe terms it matches, it needs to look at *ALL* the terms in the default field
          14:29 < hoss:#solr> mikeliss: no, no, NO ... i'm not sure where people started getting the missconception that 
                              "q=*" matches all docs, but that is *NOT* what it does
          14:29 < hoss:#solr> one second...
          14:30 < hoss:#solr> mikeliss: 
          https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201112.mbox/%3CCAL69qOn1XeMNz6JYdWj_o7rH_=O3i-NiqdO6rorvN48bywU+nA@mail.gmail.com%3E
          14:30 < hoss:#solr> ...and...
          14:30 < hoss:#solr> https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201112.mbox/%3Calpine.DEB.2.00.1112131115550.16571@bester%3E
          
          14:32 < mikeliss:#solr> hoss, ah, that makes sense. I guess * is just too tempting, since it is something users can 
                                  easily remember.
          14:34 < mikeliss:#solr> hoss, back to my original issue, now I'm confused why hl fails on a search for *. Shouldn't 
                                  it just highlight nothing, and return results? I wasn't able to get debugging to work for 
                                  the query, so I'm a bit confused..
          
          14:35 < hoss:#solr> see my other comment above: the highlighter is trying to find all the terms used in the query 
                              to highlight them -- a query for "*" matches all terms in the default field, which is way more 
                              then the highlighter can handle (hence the exception)
          
          14:38 < hoss:#solr> i'm filing a bug to change the beahvior of "q=*" ... do you mind if i cut/paste this dialog 
                              into the jira issue as an example of user confusion?
          14:39 < mikeliss:#solr> Not at all. I was wondering if that was potentially a bug...figured I'd leave it to the 
                                  experts.
          
          Show
          Hoss Man added a comment - Recent example of this type of confusion and the problems it can cause from the mailing list... https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201112.mbox/%3Calpine.DEB.2.00.1112131115550.16571@bester%3E Another recent discussion about this type of problem from IRC... 13:25 < mikeliss:#solr> Hi, I'm running into an error with maxbooleanclauses when I try to do a range query with highlighting...is there any workaround for this? Would really appreciate some direction, if anybody knows. 13:26 < mikeliss:#solr> This is the query that dies: http://localhost:8983/solr/select/?q=*&version=2.2&start=0&rows=20&indent=on&hl=true&hl.fl=text,caseName,westCite,docketNumber,lexisCite,court_citation_string&hl.snippets=5&f.text.hl.alternateField=text&f.text.hl.maxAlternateFieldLength=500 13:28 < hoss:#solr> that query doesn't make sense ... for a couple of reasons ... what are you *trying* to do? 13:29 < hoss:#solr> i mena ... for starters ... there is no range query there. second, q=* is a big red flag: it's a prefix query on the default field using the prefix "" (ie: the empty string) 14:23 < mikeliss:#solr> hoss, yeah, I assumed that highlighting would just do nothing if a prefix query were given on an empty string. 14:24 < mikeliss:#solr> hoss, I added a check in my code that will only enable highlighting if the query isn't '*'. 14:24 < mikeliss:#solr> hoss, Seems naive, but it's working at least for the moment. 14:27 < hoss:#solr> i think you're missing my point: q=* is a fairly non-sensical query ... you should't just prevent highlighting on that query, you should stop doing that query in the first place 14:28 < hoss:#solr> as a query solr can handle it, and optimize it to be efficient 14:28 < hoss:#solr> (evenn though it's silly) 14:28 < mikeliss:#solr> hoss, I'm using that query on my homepage to show the latest documents in the index. It should just return everything, right? 14:28 < hoss:#solr> but for highlighting, the highlighter actually needs to know all the terms it matches 14:28 < hoss:#solr> and to konw al lthe terms it matches, it needs to look at *ALL* the terms in the default field 14:29 < hoss:#solr> mikeliss: no, no, NO ... i'm not sure where people started getting the missconception that "q=*" matches all docs, but that is *NOT* what it does 14:29 < hoss:#solr> one second... 14:30 < hoss:#solr> mikeliss: https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201112.mbox/%3CCAL69qOn1XeMNz6JYdWj_o7rH_=O3i-NiqdO6rorvN48bywU+nA@mail.gmail.com%3E 14:30 < hoss:#solr> ...and... 14:30 < hoss:#solr> https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201112.mbox/%3Calpine.DEB.2.00.1112131115550.16571@bester%3E 14:32 < mikeliss:#solr> hoss, ah, that makes sense. I guess * is just too tempting, since it is something users can easily remember. 14:34 < mikeliss:#solr> hoss, back to my original issue, now I'm confused why hl fails on a search for *. Shouldn't it just highlight nothing, and return results? I wasn't able to get debugging to work for the query, so I'm a bit confused.. 14:35 < hoss:#solr> see my other comment above: the highlighter is trying to find all the terms used in the query to highlight them -- a query for "*" matches all terms in the default field, which is way more then the highlighter can handle (hence the exception) 14:38 < hoss:#solr> i'm filing a bug to change the beahvior of "q=*" ... do you mind if i cut/paste this dialog into the jira issue as an example of user confusion? 14:39 < mikeliss:#solr> Not at all. I was wondering if that was potentially a bug...figured I'd leave it to the experts.
          Hide
          Hoss Man added a comment -

          fix jira markup in description

          Show
          Hoss Man added a comment - fix jira markup in description
          Hide
          Jan Høydahl added a comment -

          Who knows QueryParser.jj grammar well enough to know where to fix this so that a bare "" gets translated to *:? Would make a great fix for 4.2 if it's a 15-min fix?

          Show
          Jan Høydahl added a comment - Who knows QueryParser.jj grammar well enough to know where to fix this so that a bare " " gets translated to *: ? Would make a great fix for 4.2 if it's a 15-min fix?
          Hide
          Alan Woodward added a comment - - edited

          I've had a quick look, and I think it's just a matter of changing the Clause() definition in QueryParser.jj. (Warning: untested patch)

          Index: solr/core/src/java/org/apache/solr/parser/QueryParser.jj
          ===================================================================
          --- solr/core/src/java/org/apache/solr/parser/QueryParser.jj	(revision 1451143)
          +++ solr/core/src/java/org/apache/solr/parser/QueryParser.jj	(working copy)
          @@ -195,7 +195,8 @@
           
             (
          -   q=Term(field)
          +   <STAR> {q=handleBareTokenQuery("*", "*", fuzzySlop, prefix, true, fuzzy, regexp)}
          +   | q=Term(field)
              | <LPAREN> q=Query(field) <RPAREN> (<CARAT> boost=<NUMBER>)?
              | (localParams = <LPARAMS> (<CARAT> boost=<NUMBER>)? { q=getLocalParams(field, localParams.image); }  )
             )
          
          

          I don't have time to look at this properly now, but I'll probably get a chance next week. Feel free to try out that patch though!

          Show
          Alan Woodward added a comment - - edited I've had a quick look, and I think it's just a matter of changing the Clause() definition in QueryParser.jj. (Warning: untested patch) Index: solr/core/src/java/org/apache/solr/parser/QueryParser.jj =================================================================== --- solr/core/src/java/org/apache/solr/parser/QueryParser.jj (revision 1451143) +++ solr/core/src/java/org/apache/solr/parser/QueryParser.jj (working copy) @@ -195,7 +195,8 @@ ( - q=Term(field) + <STAR> {q=handleBareTokenQuery( "*" , "*" , fuzzySlop, prefix, true , fuzzy, regexp)} + | q=Term(field) | <LPAREN> q=Query(field) <RPAREN> (<CARAT> boost=<NUMBER>)? | (localParams = <LPARAMS> (<CARAT> boost=<NUMBER>)? { q=getLocalParams(field, localParams.image); } ) ) I don't have time to look at this properly now, but I'll probably get a chance next week. Feel free to try out that patch though!
          Hide
          Jan Høydahl added a comment -

          Hmm

          lap:parser janhoy$ javacc QueryParser.jj
          Java Compiler Compiler Version 5.0 (Parser Generator)
          (type "javacc" with no arguments for help)
          Reading from file QueryParser.jj . . .
          org.javacc.parser.ParseException: Encountered " "}" "} "" at line 198, column 84.
          Was expecting one of:
              "instanceof" ...
              ";" ...
              "<" ...
          (snip)
              "(" ...
          

          Line 198 is the new <STAR>... line

          Show
          Jan Høydahl added a comment - Hmm lap:parser janhoy$ javacc QueryParser.jj Java Compiler Compiler Version 5.0 (Parser Generator) (type "javacc" with no arguments for help) Reading from file QueryParser.jj . . . org.javacc.parser.ParseException: Encountered " "}" "} "" at line 198, column 84. Was expecting one of: "instanceof" ... ";" ... "<" ... (snip) "(" ... Line 198 is the new <STAR>... line
          Hide
          Alan Woodward added a comment -

          I did say untested It needs a semicolon:

          +   <STAR> {q=handleBareTokenQuery("*", "*", fuzzySlop, prefix, true, fuzzy, regexp);}
          
          Show
          Alan Woodward added a comment - I did say untested It needs a semicolon: + <STAR> {q=handleBareTokenQuery( "*" , "*" , fuzzySlop, prefix, true , fuzzy, regexp);}
          Hide
          Jan Høydahl added a comment -

          Yea, found that out actually, but next obstacle

          lap:parser janhoy$ javacc QueryParser.jj
          Java Compiler Compiler Version 5.0 (Parser Generator)
          (type "javacc" with no arguments for help)
          Reading from file QueryParser.jj . . .
          Warning: Choice conflict involving two expansions at
                   line 198, column 4 and line 199, column 6 respectively.
                   A common prefix is: "*"
                   Consider using a lookahead of 2 for earlier expansion.
          

          Think we need to somehow say that q=Term(field) should be chosen only if NOT <STAR> or something. But if you have some time next week to test it that's great. The learning curve is a bit steep here.

          Show
          Jan Høydahl added a comment - Yea, found that out actually, but next obstacle lap:parser janhoy$ javacc QueryParser.jj Java Compiler Compiler Version 5.0 (Parser Generator) (type "javacc" with no arguments for help) Reading from file QueryParser.jj . . . Warning: Choice conflict involving two expansions at line 198, column 4 and line 199, column 6 respectively. A common prefix is: "*" Consider using a lookahead of 2 for earlier expansion. Think we need to somehow say that q=Term(field) should be chosen only if NOT <STAR> or something. But if you have some time next week to test it that's great. The learning curve is a bit steep here.
          Hide
          Yonik Seeley added a comment -

          How about this patch? No grammar modification required.

          Show
          Yonik Seeley added a comment - How about this patch? No grammar modification required.
          Hide
          Commit Tag Bot added a comment -

          [trunk commit] Yonik Seeley
          http://svn.apache.org/viewvc?view=revision&revision=1451906

          SOLR-2996: treat un-fielded * as :

          Show
          Commit Tag Bot added a comment - [trunk commit] Yonik Seeley http://svn.apache.org/viewvc?view=revision&revision=1451906 SOLR-2996 : treat un-fielded * as :
          Hide
          Commit Tag Bot added a comment -

          [branch_4x commit] Yonik Seeley
          http://svn.apache.org/viewvc?view=revision&revision=1451910

          SOLR-2996: treat un-fielded * as :

          Show
          Commit Tag Bot added a comment - [branch_4x commit] Yonik Seeley http://svn.apache.org/viewvc?view=revision&revision=1451910 SOLR-2996 : treat un-fielded * as :
          Hide
          Uwe Schindler added a comment -

          Closed after release.

          Show
          Uwe Schindler added a comment - Closed after release.

            People

            • Assignee:
              Unassigned
              Reporter:
              Hoss Man
            • Votes:
              1 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development