Details
-
Improvement
-
Status: Closed
-
Minor
-
Resolution: Fixed
-
None
-
None
Description
I've always been surprised to see the <defaultSearchField> element and <solrQueryParser defaultOperator="OR"/> defined in the schema.xml file since the first time I saw them. They just seem out of place to me since they are more query parser related than schema related. But not only are they misplaced, I feel they shouldn't exist. For query parsers, we already have a "df" parameter that works just fine, and explicit field references. And the default lucene query operator should stay at OR – if a particular query wants different behavior then use q.op or simply use "OR".
<similarity> Seems like something better placed in solrconfig.xml than in the schema.
In my opinion, defaultSearchField and defaultOperator configuration elements should be deprecated in Solr 3.x and removed in Solr 4. And <similarity> should move to solrconfig.xml. I am willing to do it, provided there is consensus on it of course.
Attachments
Attachments
- SOLR-2724_deprecateDefaultSearchField_and_defaultOperator.patch
- 7 kB
- David Smiley
- SOLR-2724_deprecateDefaultSearchField_and_defaultOperator.patch
- 4 kB
- David Smiley
Issue Links
Activity
I like "stop advertising them in the examples". I'd rather people do things the preferred way (df & q.op params) and not see these in the schema and think, "oh, I'll set them here".
RE Similary: Ok; I didn't know they were per-field in trunk. It makes sense to keep them here then.
If I make a patch complying with how you want this to work, will you commit it?
FWIW, PingRequest fails if there is no defaultSearchField
CommonsHttpSolrServer#ping produces:
org.apache.solr.common.SolrException: Ping query caused exception: no field name specified in query and no defaultSearchField defined in schema.xml
at org.apache.solr.handler.PingRequestHandler.handleRequestBody(PingRequestHandler.java:77)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1407)
at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:353)
...
request: http://127.0.0.1:8983/solr/a1/admin/ping?wt=javabin&version=2'
Yury,
The ping fails only because it evaluates a query defined in solrconfig.xml which has a query that does not specify which field it is to search. A successful conclusion to this issue would also involve adding a default search field to this query. The same goes for the firstSearcher query.
I want to breath some life into this issue in time for 3.6. Hoss proposed the following:
i guess what i would propose is that since the only real issue with them is potential confusion about where/why they exist, we:
- leave the code in to use them as fall back defaults
- leave tests in place to sanity check that they work
- stop advertising them in the examples
...ie: quietly deprecate in 3x, but don't explicitly remove in 4x.
In particular, I'd like to see these removed from the example schema right away in 3x and trunk, as Hoss suggests. q.df can be added to relevant search handlers in the solrconfig.xml. Maybe we log warnings when they are configured?
For 4, I personally think they should be removed; I don't understand why Hoss suggests not to. Perhaps wait for a subsequent 4.x point release?
We have apps that always use "AND" (or mm=100%). So for us, the global default is nice to have. I can see why it belongs in solrconfig.xml and not schema.xml though.
deprecate in 3x OK. Remove from 3x examples very much NOT OK. Add a note in the 3.x schema that this is no longer preferred, sure. But leave it in.
Remove them from trunk schema.xml OK. Remove support from 4x NOT OK. Our policy is generally to be back-compat through one major release. This isn't quite breaking things but close enough IMO.
My reasoning is that there are lots of people running with pre 3.6 Solr. If I were them, I would be pretty annoyed if I got a point release, moved my <fieldType> and <field> and <copyField> definitions over to a new example schema, started up my system and had it behave differently because the <defaultField> was quietly removed.
This is one of those issues that, while it may well be technically preferred, making this change is making decisions that affect running systems to no benefit of people already running those systems. "First do no harm" comes to mind.
Yes, all my objections can be met with "but all they have to do is make sure that they put <defaultField> back". But that's not the point. We shouldn't make gratuitous changes that make life harder for people trying to upgrade a point release.
Going to trunk is another matter, it will require a significant effort (reindexing comes to mind) and users will approach that with much more care than 3.5 -> 3.6.
FWIW
Erick
it will require a significant effort (reindexing comes to mind)
Who has to reindex? we work hard to support and test 3.x via the codecs mechanism
and we also keep around buggy 3.x behavior baggage in all the analyzers for 4.0 too.
Right, but to get all the goodness of 4.x, I though re-indexing was preferred. My point is that we have more freedom for changes like this in a major release than a minor one and should not introduce gratuitous difficulties for our end users unless it's part of a tangible benefit in a point release. This change has no tangible benefits that I can see but does have a downside.
Erick
If you want all the performance and index improvements, at the worst you use the IndexUpgrader
(http://lucene.apache.org/core/old_versioned_docs/versions/3_2_0/api/all/org/apache/lucene/index/IndexUpgrader.html) to force all your segments to be rewritten in the new format...
Erick,
First of all, anyone upgrading should examine CHANGES.txt which would certainly have a note on this explicitly in the upgrading section of that document. Secondly, most point-release upgrades people do are ones in which the config stays the same. Thirdly, the scenario you present in which someone copies a subset of their 3.5 schema would only error if that person ALSO chose to omit the defaultSearchField declaration – but why would they do that? Besides, I don't think there should be any expectation of "it'll just work" if you copy some arbitrary subset of an old schema into a new example config.
I didn't know we needed to retain backwards config capability across 3.x to 4.x That sucks and it'll make old code stick around longer. If that's true, it'd be nice if we had a standard way to annotate such code so we can methodically remove the old stuff. "@deprecated" isn't enough because it's just at the method/class level, not an internal if branch.
David:
Well, I don't particularly want to argue indefinitely on this one, so we'll probably have to agree to disagree. But I totally disagree with this statement:
bq: Besides, I don't think there should be any expectation of "it'll just work" if you copy some arbitrary subset of an old schema into a new example config.
From a user's perspective:
"I know all I changed were X, Y and Z in my schema, I'll copy those to the new one and I should be good to go. That way I'll pick up any new goodness without having to examine every line, including ones I haven't changed." is quite reasonable IMO. Messy, but true. We have a better argument with a major release when we say "you should have read the CHANGES.txt file".
My touchstone is that on a point release I think this kind of change can make our user's lives harder to no good purpose, this change doesn't give them capabilities they don't already have in exchange for any difficulties that result.
And I'm not saying that backwards compat on the config is required, it's not as far as I know. It's more that I'd vastly prefer to mark this as deprecated in 3.6 and perhaps remove it in the 4.x schema.xml, but leave in the code in 4.x so if someone does have it in their 4.0 schema they're not surprised...
At any rate, I can live with whatever decision comes out here, but misquoting the Lorax, "I speak for the users".
bq: Besides, I don't think there should be any expectation of "it'll just work" if you copy some arbitrary subset of an old schema into a new example config.
From a user's perspective:
"I know all I changed were X, Y and Z in my schema, I'll copy those to the new one and I should be good to go. That way I'll pick up any new goodness without having to examine every line, including ones I haven't changed." is quite reasonable IMO. Messy, but true. We have a better argument with a major release when we say "you should have read the CHANGES.txt file".
I don't quite agree with this reasoning but I'll pretend I do for a moment. If a user actually didn't change the defaultSearchField "text" from the example schema.xml as you suggest in your scenario, then I think things will probably still work. Why? Because the new solrconfig.xml's search handler and other applicable handlers will have q.df=text. They might not be using that handler because they are using their own – true, but I'm pointing out it's even more unlikely than you think it is.
The attached patch is intended to make it into 3.6. It comments out the defaultSearchField and defaultOperator from schema.xml with a DEPRECATED comment. By leaving it in the file, albeit commented, it should reduce confusion. I added the "df" parameter to the /select request handler. I also modified a couple log messages and exception messages in connection with this.
I intend to commit this ~midnight EST.
Oh and when I commit to trunk, I think the commented out schema.xml declarations can simply be deleted.
I think I agree with Hoss and Erick here... we're not really gaining anything with this change, and it does seem likely it will cause problems for people upgrading.
edit: I wouldn't treat q.op and defaultField the same way either... I agree w/ Hoss that defaultSearchField seems useful, and I don't even think it should be deprecated or declared bad practice. I would remove q.op from the example schema (but perhaps retain the code for those who already use it).
Whatever happens with this issue, please note SOLR-3292 and commit r1306642 that was needed on the 3x branch to keep all aspects of the example working.
(i didn't make changes on trunk for SOLR-3292 since miller had already reverted SOLR-2724 there, SOLR-3292's changes should be included if/when SOLR-2724 is re-applied)
I take issue with the Yonik's comment "we're not really gaining anything with this change". I don't know about you, but I'm always pondering how Solr can work better and be easier to understand / configure. I don't think defaultSearchField & defaultOperator have a need to exist, let alone exist in schema.xml, thereby creating unnecessary complexity in understanding the product – albeit in a small way. My work with Erik on simplifying the solrconfig.xml is along the same lines with my eventual hope to simply eliminate default="true" on a RH and the handleSelect thing.
I do agree that defaultSearchField is more useful than defaultOperator, but when it is depended on for some search or another (like the /browse fq=ipod example in SOLR-3292) , IMO it would been better/clearer for the query to be explicit.
So this issue is stale since March and Solr hangs now between heaven and hell which means that defaultSearchField is disabled in schema.xml and marked as deprecated, but the method getDefaultSearchFieldName() still exists and gives now no fallback to a default. This is bad and breaks pieces of Solr, like edismax and several more.
And the solution with df in the defaults of RequestHandler is also not working.
Now what, revert and release a fixed 3.6.2 or fix getDefaultSearchFieldName() and release a 3.6.2?
What about defaultOperator, is this one having/producing the same kind of problems as defaultSearchField?
Bernd:
Support for a default search field name still exists mostly for compatibility, and in perhaps some peoples' views as a matter of preference. It wasn't actually required before its deprecation. I thought it was only the fallback for parsing a lucene query but you indeed point out dismax has it as a fallback for 'qf', and it's used by the highligher as a fallback for 'hl.fl' although it appears the highlighter consults 'df' too. The main point behind its deprecation is that I think you should be explicit in a request which field(s) apply to what query strings or other features because the schema (schema.xml) can't know. The same applies to the default query operator which is even more of an odd duck sitting in schema.xml.
Bernd, simply define "qf" in your request handler definition to make Solr respond correctly to the same queries you had before. Arguably, Dismax/Edismax should consult "df" as a default when "qf" isn't specified. I created SOLR-3534 for this issue.
David: looking back at the mailing list ~ 28/Mar/12 it's not clear what exactly was the problem that required reverting at the time ... where the test failures even caused by this specific issue, or something else that you committed right around the same time?
Given that we've already created the 4x branch and started pushing towards Alpha, i would at least move forward with making sure trunk & 4x are on parity with 3.6 in terms of the changes to the example and the log/error messages.
Depending on what the issue was with the tests we can figure out how we want to move forward from there.
I take issue with the Yonik's comment "we're not really gaining anything with this change". ... I don't think defaultSearchField & defaultOperator have a need to exist, let alone exist in schema.xml, thereby creating unnecessary complexity in understanding the product – albeit in a small way.
I think the question is "if we stop promoting them in the example, and start encouraging an alternative instead, what is gained by actually removing the support in the code for existing users who already have them in the config and upgrade?"
It's one thing to say in CHANGES.txt "we've removed feature X because doing so allowed us (add feature|fix bug) Y, so if you used X in the past now you have to use Z instead" but there is no "Y" in this case (that i see) ... we're just telling people "we've removed X because we think Z is better, so if you used X in the past now you have to use Z instead".
You may feel it's a complexity for new users to understand why these things are in schema.xml – which is fine, but isn't removing from the example schema.xml enough to addresses? What is the value gained in removing the ability to use it for existing users who already understand it? This is the crux of my suggestion way, way, WAY back in this issue about why i didn't/don't think there was a strong motivation to remove support completely in 4x - an opinion echoed by Yonik & Erick.
As evidence from recent mailing list comments by folks like Bernd & Rohit, there is already clearly confusion for existing users just by removing these from the example – let alone removing support for it from the code.
David: looking back at the mailing list ~ 28/Mar/12 it's not clear what exactly was the problem that required reverting at the time ... where the test failures even caused by this specific issue, or something else that you committed right around the same time?
That was in relation to another issue committed immediately before or after this issue, and so to be safe this issue was reverted as it was unclear from whoever did the reverting what was the cause of the problem. I am more diligent about running all tests before committing now :-}
Given that we've already created the 4x branch and started pushing towards Alpha, i would at least move forward with making sure trunk & 4x are on parity with 3.6 in terms of the changes to the example and the log/error messages.
Gladly. I have been awaiting further input. I'll post a patch.
RE removal of support for defaultSearchField: I may have long ago recommended removing support for defaultSearchField (and default query operator) in the code but I've long given up on that. It seems once something is in Solr, it can't ever be removed, only discouraged/deprecated. I hope I'm wrong.
One thing I realized is that a delete query might depend on the default search field or operator – which is a very bad idea and thus all the more reason to deprecate them. I've seen a real situation where a delete query script was accidentally given arguments where the devops guy thought they were entering in IDs (which looked like URLs) when they were being treated as queries. So they were tokenized, and some random cross-section of the data was deleted, instead of getting an error about terms being ambiguous.
There is a test in SolrExampleTests line 499 that assumes a deleteByQuery("??::??") is going to fail due to a syntax exception when it will actually fail because no field name was specified in the query.
This is a fairly small update to the patch, which passes all tests against trunk. Notes:
- I incorporated
SOLR-3292- the addition of 'df' to many request handlers in solrconfig.xml. I left out /clustering because it uses 'qf' for dismax and I didn't anticipate someone would add crazy fq=ipod style filter queries to it, as was done in /browse. - SolrExampleTests small test related to deleteByQuery
- Added a checkNullField() to SolrQueryParser.getWildcardQuery(). In diagnosing the wildcard deleteByQuery test I noticed the error message was an unhelpful NPE.
As with SOLR-3534, I think it would be helpful to users and ease the support burden if we added a warning to the comment for the deprecated <defaultSchemaField> that says "WARNING: The "df" query request parameter, including the default in any query request handlers, overide this default search field. So, before uncommenting this deprecated element, be sure to review how "df" may be used in relevant request handlers in such a way as to override this deprecated default search field."
I agree Jack. I spent some time wordsmithing the comments, trying to be more descriptive but not too much. Here is what I propose:
<!-- DEPRECATED: The defaultSearchField is consulted by the Lucene query parser[*] when parsing a query string that isn't explicit about the field. Machine (non-user) generated queries are best made explicit, or they can use the "df" request parameter which takes precedence over this. [*]: It's used by some other parsers and in a couple corners of Solr too. Note: Un-commenting defaultSearchField will be insufficient if your request handler in solrconfig.xml defines "df", which takes precedence. That would need to be removed. <defaultSearchField>text</defaultSearchField> --> <!-- DEPRECATED: The defaultOperator (AND|OR) is consulted by the Lucene query parser[*] when parsing a query string to determine if a clause of the query should be marked as required or optional, assuming the clause isn't already marked by some operator. The default is OR, which is generally assumed so it is not a good idea to change it globally here. The "q.op" request parameter takes precedence over this. [*]: It's used by some other parsers and in a couple corners of Solr too. <solrQueryParser defaultOperator="OR"/> -->
Sounds good. At least to me, but I already know what you mean.
Although, I'm not so sure we need to drag "Lucene" into this. I mean, technically the Lucene query parser is completely innocent and it is the Solr wrapper that "does the deed." Maybe it is sufficient to refer to "various query parsers, including the default Solr query parser, dismax, and edismax."
I would suggest that we advise the user: "The preferred technique is to revise the "df" parameter for the request handlers that your application will use."
(Hmmm... An advantage of defaultSearchField was that you do it in one place and then it is set for all request handlers for which it is relevant!)
Although, I'm not so sure we need to drag "Lucene" into this. I mean, technically the Lucene query parser is completely innocent and it is the Solr wrapper that "does the deed." Maybe it is sufficient to refer to "various query parsers, including the default Solr query parser, dismax, and edismax."
The "Lucene query parser" is an accurate statement insofar as there being a query parser named "lucene" (which is technically Solr's extension to the raw Lucene one). But I agree with your recommendation on substituting "various query parsers", and then I can remove the [*] note.
I would suggest that we advise the user: "The preferred technique is to revise the "df" parameter for the request handlers that your application will use." (Hmmm... An advantage of defaultSearchField was that you do it in one place and then it is set for all request handlers for which it is relevant!)
My recommendation / point of view doesn't coincide with yours. I think your user queries ('q' param) should be [e]dismax and should specify "qf" and "mm". Other queries (like 'fq', 'bq', 'facet.query', ...) are machine generated and I think they should be explicit / unambiguously stand-alone without the need for "df", "q.op" or these global deprecated ones here since they have sweeping affects (potentially affect queries you didn't want them to) and reduce the clarity of interpreting any one of these queries by itself because of ambiguity. I do use 'df' and 'q.op' in my apps as local-params on occasion.
The fact that most of the request handlers in the default solrconfig.xml define 'df' as part of this patch is unfortunate, and has more to do with the legacy of Solr usage. For example the "/browse" one has it because it includes an unrealistic query – facet.query=ipod Please! Even if I fix this query (and I probably will if I remember), I expect to meet resistance from some on wether to keep/exclude 'df' and I don't want to argue further on this subject.
I agree that we should encourage machine-generated queries to use explicit field names. My suggested "recommendation" is simply for the case of user-written queries passed via the "q" parameter so that we're not leaving the user hanging, wondering what to do, or more importantly, what the preferred action should be to go along with the spirit of the deprecation. In short, hopefully you can combine my suggested recommendation with your machine-generated query provision so that my suggested recommendation doesn't sound like it is in conflict.
My wording mentions "df" twice and there's probably twice as much explanation now than before, so IMO, I don't think users will be left hanging, wondering what to do.
(I appreciate your feedback, Jack)
Committed to trunk in r1351931, and to 4x in r1351935
The small log level messages in IndexWriter were accidentally committed as part of SOLR-3534 instead of this one.
Closing.
Hello.
Looking at the schema.xml that ships with solr 4.5.1, we still have
<!-- field for the QueryParser to use when an explicit fieldname is absent --> <defaultSearchField>text</defaultSearchField> <!-- SolrQueryParser configuration: defaultOperator="AND|OR" --> <solrQueryParser defaultOperator="OR"/>
Shouldn't those fields be removed?
I tried to remove them but some of our tests are failing, meaning that some Solr components are still using them.
I don't yet know which components though.
I just checked but I'm not seeing it in ./example/solr/collection1/conf/schema.xml which is the main reference schema. The multicore example still has that. I'm more of the opinion ./multicore should be simply omitted altogether than improved.
I'm not surprised some tests depend on some schemas having these defaults. These schema elements were marked deprecated which means they are discouraged but still supported. They were written before SOLR-2724, most likely. Nothing should fundamentally depend on these being specified in schema.xml; there are other better places like 'df' & 'q.op' in local-params.
Hi,
I do not know if my comment is useful, but I found some code that still uses the defaultOperator from schema.xml in solr-core-4.5.1
public static String parseMinShouldMatch(final IndexSchema schema, final SolrParams params) { org.apache.solr.parser.QueryParser.Operator op = QueryParsing.getQueryParserDefaultOperator (schema, params.get(QueryParsing.OP)); return params.get(DisMaxParams.MM, op.equals(QueryParser.Operator.AND) ? "100%" : "0%"); }
public static QueryParser.Operator getQueryParserDefaultOperator(final IndexSchema sch, final String override) { String val = override; if (null == val) val = sch.getQueryParserDefaultOperator(); return "AND".equals(val) ? QueryParser.Operator.AND : QueryParser.Operator.OR; }
FWIW: defaultSearchField and solrQueryParser/@defaultOperator both predate all of the QParser stuff, and i agree that if we had QParsers from the begining, they probably would have never existed.
Conceptually, when you think about schema design, and the roles of "schema maintainer" and "solr instance maintainer" i view the defaultSearchField as a fairly important piece of information the schema maintainer is providing for people who then consume the index. It's a statement (from the schema maintainer to any/all solr users) that in the absence of any more specific use cases provided by the solr instance maintainer, this field can be used as a field to search on.
Is that conceptual meaning significant enough to warrant saying "this is a crucual feature in solr" ? .. probably not, but it's how i think about it.
At this point though, i'm not sure that eliminating them really improves anything. yes we have the df param and the q.op param, but if/when those params aren't set, these values provide system wide defaults. We certain can deprecate them, but i don't know that that really helps our users in anyway. we could move them to solrconfig.xml, but that would require the same amount of work from users to modify configs as just eliminating them.
i guess what i would propose is that since the only real issue with them is potential confusion about where/why they exist, we:
...ie: quietly deprecate in 3x, but don't explicitly remove in 4x.
Similarity is a whole different ball of wax. it's already per-fieldType on trunk, so it really can't be moved to solrconfig.xml (even if it wasn't, it's something that's used at both index time and query time so it MUST be consistent in master/slave type setups, which is the driving force for when soemthing must live in schema.xml)