Patch that brings Andrzej's patch up to date with trunk, and adds tests for query-time functionality.
I had assumed that PreAnalyzedField-s would use the PreAnalyzedTokenizer at query time, but that is not (currently) the case: instead FieldType.DefaultAnalyzer is used. This patch changes the behavior when no analyzer is specified to instead use PreAnalyzedTokenizer.
However, there is a chicken-and-egg interaction between PreAnalyzedTokenizer and QueryBuilder.createFieldQuery(), which aborts before performing any tokenization if the supplied analyzer's attribute factory doesn't contain a TermToBytesRefAttribute. But PreAnalyzedTokenizer doesn't have any attributes defined until the input stream is consumed, in reset(). Robert Muir added a comment as part of
LUCENE-5388 to PreAnalyzedTokenizer's ctor, where AttributeFactory.DEFAULT_ATTRIBUTE_FACTORY is set as the attribute factory rather than the default packed implementation: "we don't pack attributes: since we are used for (de)serialization and dont want bloat."
This patch moves the stream.reset() call in QueryBuilder.createFieldQuery() in front of the TermToBytesRefAttribute check, so that PreAnalyzedTokenizer (and other tokenizers that don't have a pre-added set of attributes) has a chance to populate its attributes, and also moves the addAttribute(PositionIncrementAttribute.class) call to after the TermToBytesRefAttribute check, since that won't be needed if no tokenization will be performed.
An alternate approach to fix the chicken-and-egg problem might be to have PreAnalyzedTokenizer always include a dummy TermToBytesRefAttribute implementation, and then remove it when reset() is called, but that seems hackish.
I haven't run the full tests yet with this patch, but the included query-time PreAnalyzedField tests succeed.
I welcome feedback.