Don't forget that this auto-phrase-gen is buggy: if the user's query
is wi fi, then this will not turn into a phrase.
Really, it's QueryParser that's buggy: it should not assume it can
pre-split on whitespace.
As Robert has pointed out, even if the feature weren't buggy, there's
no evidence auto-phrase-gen actually improves relevance even for
Yet it's most definitely disastrous for non-whitespace languages (CJK,
This is why, in my opinion, if we must pick a single global default
(for the 'text' field in Solr's example schema.xml), it should be
disabled by default: it's buggy for English and catastrophic for
To fix this "correctly", we somehow need a better QueryParser/Analyzer
interaction, such that all variants of wifi (WiFi, wifi, wi fi, wi-fi)
are consistently mapped during indexing and searching. Just adding a
new per-token attr doesn't fix it (the wi fi example, above).
I'm not sure what that would accomplish by itself though... it's not like solr is much of an out-of-the-box solution for anything.
We have a default example so that people can easily run through the tutorial, and execute examples on wiki pages.
I suspect many apps take the default solrconfig/schema and run with
it / iteratitvely tweak it.
Solr doesn't have an installer though... you unzip and "cd example; java -jar start.jar".
Maybe we insert a "cp
schema.xml schema.xml" in between
those two steps? This would avoid the global default, ie, force an
Or maybe we make separate default fieldTypes in schema.xml
(text_whitespace, text_non_whitespace – need better names)?
Or, maybe we make this setting take three values: unset, on, off. It
defaults to unset, but Solr refuses to run with this value, throwing
an exception saying you must set it?
Something along these lines would let us avoid having to agree on a
global default, ie, make the choice explicit.
This is just like what we did with maxFieldLength a while back. Previously
it silently truncated after 10K terms, which was a dangerous default. So, we
forced the choice, by making it a required param in IW. (Later we then
change the default to no truncation, and make it not required).