Affects Version/s: None
Fix Version/s: None
PreAnalyzedField fails to index documents without tokens like the following data:
PreAnalyzedField consumes field values which have been pre-analyzed in advance. The format of pre-analyzed value is like follows:
As the document mensions, "str" and "tokens" are optional, i.e., both an empty value and no key are allowed. However, when "tokens" is empty or not defined, PreAnalyzedField throws IOException and fails to index the document.
This error is related to the behavior of Field#tokenStream. This method tries to create TokenStream by following steps (NOTE: assume indexed=true):
- If the field has tokenStream value, returns it.
- Otherwise, creates tokenStream by parsing the stored value.
If pre-analyzed value doesn't have tokens, the second step will be executed. Unfortunately, since PreAnalyzedField always returns PreAnalyzedAnalyzer as the index analyzer and the stored value (i.e., the value of "str") is not the pre-analyzed format, this step will fail due to the pre-analyzed format error (i.e., IOException).
1. Download latest solr package and prepare solr server according to Solr Tutorial.
2. Add following fieldType and field to the schema.
3. Index following documents and Solr will throw IOException.
Because we don't need to analyze again if "tokens" is empty or not set, we can avoid this error by setting EmptyTokenStream as tokenStream instead like the following code: