Patch that addresses some of this issue, with some failing tests and nocommits.
The existing autoGeneratePhraseQueries=true approach generates queries exactly as if the query had contained quotation marks, but as I mentioned above, this is inappropriate when splitOnWhitespace=false and the query text contains spaces.
The approach in the patch is to add a new QueryBuilder method to handle the autoGeneratePhraseQueries=true case. The query text is split on whitespace and these tokens' offsets are compared to those produced by the configured analyzer. When multiple non-overlapping tokens have offsets within the bounds of a single whitespace-separated token, a phrase query is created. If the original token is present as a token overlapping with the first split token, then a disjunction query is created with the original token and the phrase query of the split tokens.
I've added a couple of tests that show posincr/poslength/offset output from SynonymFilter and WordDelimiterFilter (likely the two most frequently used analysis components that can create split tokens), and both create corrupt token graphs of various kinds (e.g.
LUCENE-6582, LUCENE-5051), so solving this problem in a complete way just isn't possible right now.
So I'm not happy with the approach in the patch. It only covers a subset of possible token graphs (e.g. more than one overlapping multi-term synonym doesn't work). And it's a lot of new code solving a problem that AFAIK no user has reported (does anybody even use autoGeneratePhraseQueries=true with classic QP?),
I'd be much happier if we could somehow get TermAutomatonQuery hooked into the query parsers, and then rewrite to simpler queries if possible:
LUCENE-6824. First thing though is unbreaking SynonymFilter and friends to produce non-broken token graphs though. Attempts to do this for SynonymFilter have stalled though: LUCENE-6664. (I have a germ of an idea that might break the logjam - I'll post over there.)
For this issue, maybe instead of my patch, for now, we just disallow autoGeneratePhraseQueries=true when splitOnWhitespace=false.