> 1) would it make sense for the keep option to refer to a file, using the same format as StopFilter ... that way it's easy to reuse the same file (which seems like it would be a common case.
probably. that is a good idea
> 2) what is the point of forceFirstLetter="true" ? ... if you want to force capitalization, what's the point of making hte keep list?
This is one that came of necessity!
with keep="the ..." and input:
"Grand army of the Republic", "the arts"
I want: "Grand Army of the Republic" and "The Arts"
"forceFirstLetter" only applies to the first character in the token, not to each word.
> 3) is okPrefix going to force the case for things that have that prefix in an alternate case, or only allow that casing to remain (ie: if i index McKeen, Mckeen, mckeen and MCKEEN what tokens do i wind up with?)
As written, if the prefix matches, it assumes the word capitalization is correct. For my input data, this is sufficient – but it should problem do something smarter.
So, if you index "McKeen, Mckeen, mckeen, MCKEEN and McKEEN", you would get:
"McKeen, Mckeen, Mckeen, Mckeen And McKEEN"
If "okPrefix" was treated as the capitalization for input where the lowercase prefix matches "mck", it would give:
"McKeen, McKeen, McKeen, McKeen And McKeen"