rename StringField to KeywordField, making it more obvious that this field isn't tokenized. Then a KeywordsField can take a String or BytesRef in ctors.
Both Lucene and Solr are suffering from a conflation of the two concepts of treating an input stream as a single token ("a keyword") and as a sequence of tokens ("sequence of keywords"). We have the "KeywordTokenizer" that does NOT tokenize the input stream into "a sequence of keywords". The term "keyword search" is commonly used to describe the ability of search engines to find "individual keywords" in extended streams of "text" - a clear reference to "keyword" in a tokenized stream.
So, I don't understand how it is claimed that naming StringField to KeywordField is making anything "obvious" - it seems to me to be adding to the existing confusion rather than clarifying anything. I mean, the term "keyword" should be treated more as a synonym for "token" or "term", NOT as synonym for "string" or "raw character sequence".
I agree that we need a term for "raw, uninterpreted character sequence", but it seems to me that "string" is a more "obvious" candidate than "keyword".
There has been some grumbling at the Solr level that KeywordTokenizer should be renamed to... something, anything, but just not KeywordTokenizer, which "obviously" implied that the input stream will be tokenized into a sequence of keywords, which it does not.
In an effort to try to resolve this ongoing confusion, can somebody provide from historical background as to how KeywordTokenizer got its name, and how a subset of people continue to refer to an uninterpreted sequence of characters as a "keyword" rather than a string. I checked the Javadoc, Jira, and even the source code, but came up empty.
In short, it is a real eye-opener to see a claim that the term "keyword" in any way makes it "obvious" that input is not tokenized!!
Maybe we could fix this for 5.0 to have a cleaner set of terminology going forward. At a minimum, we should have some clarifying language in the Javadoc. And hopefully we can refrain from making the confusion/conflation worse by renaming StringField to KeywordField.
Then a KeywordsField can take a String
Is that simply a typo or is the intent to have both a KeywordField (singular) and a KeywordsField (plural)? I presume it is a typo, but... maybe it's a Freudian slip and highlights this semantic difficulty that persists in the Lucene terminology (and hence infects Solr terminology as well.)