Solr
  1. Solr
  2. SOLR-218

Support for Lucene QueryParser properties via solrconfig.xml file

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 1.1.0
    • Fix Version/s: 4.9, 5.0
    • Component/s: search
    • Labels:
      None

      Description

      The SolrQueryParser class, which extends Lucene's QueryParser class, does not provide any way of setting the various QueryParser properties via the solr config file (solrconfig.xml). These properties include:

      allowLeadingWildcard (Set to true to allow * and ? as the first character of a PrefixQuery and WildcardQuery)
      dateResolution: Sets the default date resolution used by RangeQueries for fields for which no specific date resolutions has been set.
      defaultOperator: Sets the boolean operator of the QueryParser.
      fuzzyMinSim: Set the minimum similarity for fuzzy queries.
      locale: Set locale used by date range parsing.
      lowercaseExpandedTerms: Whether terms of wildcard, prefix, fuzzy and range queries are to be automatically lower-cased or not.
      phraseSlop: Sets the default slop for phrases.
      useOldRangeQuery: By default QueryParser uses new ConstantScoreRangeQuery in preference to RangeQuery for range queries.

      This can be achieved by calling the setter methods for these properties in the SolrQueryParser constructor,

      public SolrQueryParser(IndexSchema schema, String defaultField)

      { super(defaultField == null ? schema.getDefaultSearchFieldName() : defaultField, schema.getQueryAnalyzer()); this.schema = schema; setAllowLeadingWildcard(SolrConfig.config.getBool("query/setAllowLeadingWildcard")); setLowercaseExpandedTerms(SolrConfig.config.getBool("query/lowerCaseExpandedTerms")); }

      In addition, solr should not modify these values from the defaults provided by Lucene, as it currently does by calling setLowercaseExpandedTerms(false) in this method.

        Issue Links

          Activity

          Hide
          Uwe Schindler added a comment -

          Move issue to Solr 4.9.

          Show
          Uwe Schindler added a comment - Move issue to Solr 4.9.
          Hide
          Furkan KAMACI added a comment - - edited

          Mark Miller This is an old issue and SolrQueryParser has changed since that time. There is an effort for leading wildcard at comments of current issue but https://issues.apache.org/jira/browse/SOLR-1321 has resolved the problem as like described here: http://lucene.472066.n3.nabble.com/hi-allowLeadingWildcard-is-it-possible-or-not-yet-td495457.html Is there anything to do for this issue? if not issue can be closed, if yes I can help and create a patch for it.

          Show
          Furkan KAMACI added a comment - - edited Mark Miller This is an old issue and SolrQueryParser has changed since that time. There is an effort for leading wildcard at comments of current issue but https://issues.apache.org/jira/browse/SOLR-1321 has resolved the problem as like described here: http://lucene.472066.n3.nabble.com/hi-allowLeadingWildcard-is-it-possible-or-not-yet-td495457.html Is there anything to do for this issue? if not issue can be closed, if yes I can help and create a patch for it.
          Hide
          Steve Rowe added a comment -

          Bulk move 4.4 issues to 4.5 and 5.0

          Show
          Steve Rowe added a comment - Bulk move 4.4 issues to 4.5 and 5.0
          Hide
          Hoss Man added a comment -

          Bulk of fixVersion=3.6 -> fixVersion=4.0 for issues that have no assignee and have not been updated recently.

          email notification suppressed to prevent mass-spam
          psuedo-unique token identifying these issues: hoss20120321nofix36

          Show
          Hoss Man added a comment - Bulk of fixVersion=3.6 -> fixVersion=4.0 for issues that have no assignee and have not been updated recently. email notification suppressed to prevent mass-spam psuedo-unique token identifying these issues: hoss20120321nofix36
          Hide
          Robert Muir added a comment -

          3.4 -> 3.5

          Show
          Robert Muir added a comment - 3.4 -> 3.5
          Hide
          Robert Muir added a comment -

          Bulk move 3.2 -> 3.3

          Show
          Robert Muir added a comment - Bulk move 3.2 -> 3.3
          Hide
          Hoss Man added a comment -

          Bulk updating 240 Solr issues to set the Fix Version to "next" per the process outlined in this email...

          http://mail-archives.apache.org/mod_mbox/lucene-dev/201005.mbox/%3Calpine.DEB.1.10.1005251052040.24672@radix.cryptio.net%3E

          Selection criteria was "Unresolved" with a Fix Version of 1.5, 1.6, 3.1, or 4.0. email notifications were suppressed.

          A unique token for finding these 240 issues in the future: hossversioncleanup20100527

          Show
          Hoss Man added a comment - Bulk updating 240 Solr issues to set the Fix Version to "next" per the process outlined in this email... http://mail-archives.apache.org/mod_mbox/lucene-dev/201005.mbox/%3Calpine.DEB.1.10.1005251052040.24672@radix.cryptio.net%3E Selection criteria was "Unresolved" with a Fix Version of 1.5, 1.6, 3.1, or 4.0. email notifications were suppressed. A unique token for finding these 240 issues in the future: hossversioncleanup20100527
          Hide
          Mark Miller added a comment -

          I made the same changes, but I was using trunk. I'll give it a shot with 1.3 in a bit and report back.

          Show
          Mark Miller added a comment - I made the same changes, but I was using trunk. I'll give it a shot with 1.3 in a bit and report back.
          Hide
          Jonas Salk added a comment - - edited

          That is good to hear Mark. Would you mind taking a few minutes and putting down exact code changes you made? I want to backtrack everything i did to ensure, I've not made some mistakes. I've basically updated only one Java file: SolrQueryParser.java.

          I'm using: apache-solr-1.3.0
          Jonas

          SolrQueryParser.java
          ------------------------------
          public SolrQueryParser(IndexSchema schema, String defaultField)

          { ... // added setAllowLeadingWildcard(true); setLowercaseExpandedTerms(true); ... }

          ...
          public SolrQueryParser(QParser parser, String defaultField, Analyzer analyzer)

          { ... setAllowLeadingWildcard(true); setLowercaseExpandedTerms(true); ... }
          Show
          Jonas Salk added a comment - - edited That is good to hear Mark. Would you mind taking a few minutes and putting down exact code changes you made? I want to backtrack everything i did to ensure, I've not made some mistakes. I've basically updated only one Java file: SolrQueryParser.java. I'm using: apache-solr-1.3.0 Jonas SolrQueryParser.java ------------------------------ public SolrQueryParser(IndexSchema schema, String defaultField) { ... // added setAllowLeadingWildcard(true); setLowercaseExpandedTerms(true); ... } ... public SolrQueryParser(QParser parser, String defaultField, Analyzer analyzer) { ... setAllowLeadingWildcard(true); setLowercaseExpandedTerms(true); ... }
          Hide
          Mark Miller added a comment -

          Hey Jonas, I just did the same and it worked no problem.

          Perhaps try a clean and build the project again?

          I grabbed a fresh checkout of Solr, loaded the example docs, tried your search but with the 'name:*pod' and it blew up as expected.

          I made the changes, ran example again, and the queries worked as expected.

          Anything of importance that you have going on different there?

          Show
          Mark Miller added a comment - Hey Jonas, I just did the same and it worked no problem. Perhaps try a clean and build the project again? I grabbed a fresh checkout of Solr, loaded the example docs, tried your search but with the 'name:*pod' and it blew up as expected. I made the changes, ran example again, and the queries worked as expected. Anything of importance that you have going on different there?
          Hide
          Jonas Salk added a comment -

          Not an expert at SOLR, currently using the default configurations which are shipped with it. Taking suggestions above to update SolrQueryParser to allow for leading wildcard as allowed by Lucene, i modified the following 2 methods:

          public SolrQueryParser(IndexSchema schema, String defaultField)

          { ... // added setAllowLeadingWildcard(true); setLowercaseExpandedTerms(true); ... }

          ...
          public SolrQueryParser(QParser parser, String defaultField, Analyzer analyzer)

          { ... setAllowLeadingWildcard(true); setLowercaseExpandedTerms(true); ... }

          everything compiled and 'ant' built a distro. However, still throws an exception on a query request:

          Test query: "http://localhost:8983/solr/select/?indent=on&q=CommentText:Hello"

          Finds two documents.

          However, wildcard query: "http://localhost:8983/solr/select/?indent=on&q=CommentText:*ello"

          throws this exception:
          org.apache.lucene.queryParser.ParseException: Cannot parse 'CommentText:ello': '' or '?' not allowed as first character in WildcardQuery

          Any suggestions on how i can prevent this exception and get this to work?

          Regards,
          Jonas

          Show
          Jonas Salk added a comment - Not an expert at SOLR, currently using the default configurations which are shipped with it. Taking suggestions above to update SolrQueryParser to allow for leading wildcard as allowed by Lucene, i modified the following 2 methods: public SolrQueryParser(IndexSchema schema, String defaultField) { ... // added setAllowLeadingWildcard(true); setLowercaseExpandedTerms(true); ... } ... public SolrQueryParser(QParser parser, String defaultField, Analyzer analyzer) { ... setAllowLeadingWildcard(true); setLowercaseExpandedTerms(true); ... } everything compiled and 'ant' built a distro. However, still throws an exception on a query request: Test query: "http://localhost:8983/solr/select/?indent=on&q=CommentText:Hello" Finds two documents. However, wildcard query: "http://localhost:8983/solr/select/?indent=on&q=CommentText:*ello" throws this exception: org.apache.lucene.queryParser.ParseException: Cannot parse 'CommentText: ello': ' ' or '?' not allowed as first character in WildcardQuery Any suggestions on how i can prevent this exception and get this to work? Regards, Jonas
          Hide
          Shalin Shekhar Mangar added a comment -

          Marking for 1.5

          Show
          Shalin Shekhar Mangar added a comment - Marking for 1.5
          Hide
          Hoss Man added a comment -

          In response to a question on the mailing list about how best to tackle this...

          It's not documented very well (or: at all) at the moment, but it's now possible to declare "<queryParser>" in your solrconfig.xml, just like <requestHandler>" ... and those each correspond to a QParserPlugin. The LuceneQParserPlugin could be modified to take in some init options and use them in it's "createParser" method to set options on the underlying
          SolrQueryParser. people could declare multiple instances of the LuceneQParserPlugin with differnet names, and use them by specifying a defType in their request – or they could give one of those instance the name "lucene" and it will be used by default.

          Show
          Hoss Man added a comment - In response to a question on the mailing list about how best to tackle this... It's not documented very well (or: at all) at the moment, but it's now possible to declare "<queryParser>" in your solrconfig.xml, just like <requestHandler>" ... and those each correspond to a QParserPlugin. The LuceneQParserPlugin could be modified to take in some init options and use them in it's "createParser" method to set options on the underlying SolrQueryParser. people could declare multiple instances of the LuceneQParserPlugin with differnet names, and use them by specifying a defType in their request – or they could give one of those instance the name "lucene" and it will be used by default.
          Hide
          Hoss Man added a comment -

          quick throught for anyone that may want to tackle a patch for this... given the "recent" addition of search components, it may make sense to completley deprecate the existing <solrQueryParser .. /> directive in the schema.xml and make all of these options for the "QueryComponent" class.

          (that way people can register multiple instances of the QueryComponent with different options, and hen use those alternate instances in different handler instances)

          Show
          Hoss Man added a comment - quick throught for anyone that may want to tackle a patch for this... given the "recent" addition of search components, it may make sense to completley deprecate the existing <solrQueryParser .. /> directive in the schema.xml and make all of these options for the "QueryComponent" class. (that way people can register multiple instances of the QueryComponent with different options, and hen use those alternate instances in different handler instances)
          Hide
          Hoss Man added a comment -

          reminder: when addressing this, we should make sure there is an option for turning of ConstantScorePrefixQuery as well .. some people may prefer the stock lucene behavior (particularly if no good solution is found for SOLR-195)

          Show
          Hoss Man added a comment - reminder: when addressing this, we should make sure there is an option for turning of ConstantScorePrefixQuery as well .. some people may prefer the stock lucene behavior (particularly if no good solution is found for SOLR-195 )
          Hide
          Michael Kimsal added a comment -

          setAllowLeadingWildcard(SolrConfig.config.getBool("query/setAllowLeadingWildcard"));
          setLowercaseExpandedTerms(SolrConfig.config.getBool("query/lowerCaseExpandedTerms"));
          =============================================================================

          From what I understand, these sorts of things could likely be handled by custom query parsers. However,
          I'm voting for this because I'd like to see the ability to configure these items globally first, as well as
          already having the option to write custom query parsers if needed. This provides an easier way to
          configure the behaviour without needing to write code or recompile anything.

          Show
          Michael Kimsal added a comment - setAllowLeadingWildcard(SolrConfig.config.getBool("query/setAllowLeadingWildcard")); setLowercaseExpandedTerms(SolrConfig.config.getBool("query/lowerCaseExpandedTerms")); ============================================================================= From what I understand, these sorts of things could likely be handled by custom query parsers. However, I'm voting for this because I'd like to see the ability to configure these items globally first, as well as already having the option to write custom query parsers if needed. This provides an easier way to configure the behaviour without needing to write code or recompile anything.
          Hide
          Yonik Seeley added a comment -

          > I just don't see how it's possible for Solr to "figure out the right thing to do automatically" in every case.

          Here's my shot at it: SOLR-219

          Show
          Yonik Seeley added a comment - > I just don't see how it's possible for Solr to "figure out the right thing to do automatically" in every case. Here's my shot at it: SOLR-219
          Hide
          Yonik Seeley added a comment -

          > Meanwhile, I will look at writing a plugin so I can get the functionality I need without having to modify the Solr source.

          I'm confident this will be fixed, but in the meantime isn't the simplest solution to lowercase any prefix or wildcard query in the client?

          Show
          Yonik Seeley added a comment - > Meanwhile, I will look at writing a plugin so I can get the functionality I need without having to modify the Solr source. I'm confident this will be fixed, but in the meantime isn't the simplest solution to lowercase any prefix or wildcard query in the client?
          Hide
          Michael Pelz-Sherman added a comment -

          I just don't see how it's possible for Solr to "figure out the right thing to do automatically" in every case.

          Even if this were possible, I don't see how it harms Solr to offer access to these configuration parameters. Whether this is done through the solrconfig.xml or the schema.xml isn't really important to me; I would just like to have some way of adjusting these parameters without having to write a plugin. If it can be a per-field setting, great, but it's nice to have a global setting as well.

          As for setLowercaseExpandedTerms(), it seems to me that Solr should not override the default settings provided by Lucene without a very solid reason. For such a young product, I question whether backward compatibility is a valid justification for doing so.

          Anyway, thanks very much for considering this. Meanwhile, I will look at writing a plugin so I can get the functionality I need without having to modify the Solr source.

          Show
          Michael Pelz-Sherman added a comment - I just don't see how it's possible for Solr to "figure out the right thing to do automatically" in every case. Even if this were possible, I don't see how it harms Solr to offer access to these configuration parameters. Whether this is done through the solrconfig.xml or the schema.xml isn't really important to me; I would just like to have some way of adjusting these parameters without having to write a plugin. If it can be a per-field setting, great, but it's nice to have a global setting as well. As for setLowercaseExpandedTerms(), it seems to me that Solr should not override the default settings provided by Lucene without a very solid reason. For such a young product, I question whether backward compatibility is a valid justification for doing so. Anyway, thanks very much for considering this. Meanwhile, I will look at writing a plugin so I can get the functionality I need without having to modify the Solr source.
          Hide
          Yonik Seeley added a comment -

          With regards to doing something like lowercasing wildcard queries, that should be a per-field setting for Solr.
          But as I stated here: http://www.nabble.com/case-sensitivity-tf3654523.html
          I think Solr should figure out the right thing to do automatically.
          The implementation should probably be a method on FieldType that handles lowercasing (or otherwise manipulating) the wildcard query if necessary, or perhaps throwing an exception if it's just not supported for that field type.

          Show
          Yonik Seeley added a comment - With regards to doing something like lowercasing wildcard queries, that should be a per-field setting for Solr. But as I stated here: http://www.nabble.com/case-sensitivity-tf3654523.html I think Solr should figure out the right thing to do automatically. The implementation should probably be a method on FieldType that handles lowercasing (or otherwise manipulating) the wildcard query if necessary, or perhaps throwing an exception if it's just not supported for that field type.
          Hide
          Hoss Man added a comment -

          1) these options shouldn't be specified in the solrconfig.xml, they should come from the schema.xml (since knowing whether you want these options tends to depends on how you have configured the fields) ... the <solrQueryParser .. /> directive already exists for this purpose, and defaultOperator is already supported.

          2) these settings shouldn't be applied in the constructor for SolrQueryParser per the contract described in it's comment...
          /**

          • Constructs a SolrQueryParser using the schema to understand the
          • formats and datatypes of each field. Only the defaultSearchField
          • will be used from the IndexSchema (unless overridden),
          • <solrQueryParser> will not be used.
            ...changing that makes it very hard for plugin writers to subclass SolrQueryParser to get a schema aware parser with no other changes. This is what QueryParsing.parseQuery is for (although i fully support a new factory method for returning an instance of a SolrQueryParser with these options set on it.

          3) options like dateResolution assume use of DateTools which is not what DateField uses ... it's possible some users might be using StrField and using DateTools to format it on the client side, but we should think carefully before adding this option.

          4) do we want another option to enable/disable the use of PrefixFilter for prefix queries that is currently in SolrQueryParser? it does cause problems with highlighting prefix queries.

          5) regarding this comment...

          > In addition, solr should not modify these values from the defaults provided by Lucene, as it currently does by
          > calling setLowercaseExpandedTerms(false) in this method.

          ...while i fully agree with this sentiment, changing now to align ourselves with the Lucene default would break backwards compatibility for existing Solr users. if the option is not used i the schema.xml, we need to assume setLowercaseExpandedTerms(false).

          Show
          Hoss Man added a comment - 1) these options shouldn't be specified in the solrconfig.xml, they should come from the schema.xml (since knowing whether you want these options tends to depends on how you have configured the fields) ... the <solrQueryParser .. /> directive already exists for this purpose, and defaultOperator is already supported. 2) these settings shouldn't be applied in the constructor for SolrQueryParser per the contract described in it's comment... /** Constructs a SolrQueryParser using the schema to understand the formats and datatypes of each field. Only the defaultSearchField will be used from the IndexSchema (unless overridden), <solrQueryParser> will not be used. ...changing that makes it very hard for plugin writers to subclass SolrQueryParser to get a schema aware parser with no other changes. This is what QueryParsing.parseQuery is for (although i fully support a new factory method for returning an instance of a SolrQueryParser with these options set on it. 3) options like dateResolution assume use of DateTools which is not what DateField uses ... it's possible some users might be using StrField and using DateTools to format it on the client side, but we should think carefully before adding this option. 4) do we want another option to enable/disable the use of PrefixFilter for prefix queries that is currently in SolrQueryParser? it does cause problems with highlighting prefix queries. 5) regarding this comment... > In addition, solr should not modify these values from the defaults provided by Lucene, as it currently does by > calling setLowercaseExpandedTerms(false) in this method. ...while i fully agree with this sentiment, changing now to align ourselves with the Lucene default would break backwards compatibility for existing Solr users. if the option is not used i the schema.xml, we need to assume setLowercaseExpandedTerms(false).

            People

            • Assignee:
              Unassigned
              Reporter:
              Michael Pelz-Sherman
            • Votes:
              19 Vote for this issue
              Watchers:
              16 Start watching this issue

              Dates

              • Created:
                Updated:

                Development