Lucene classification module's flexibility and capabilities may be improved with the following:
- make it possible to use them "online" (or provide an online version of them) so that if the underlying index(reader) is updated the classifier doesn't need to be trained again to take into account newly added docs
- eventually pass a different Analyzer together with the text to be classified (or directly a TokenStream) to specify custom tokenization/filtering.
- normalize score calculations of existing classifiers
- provide publicly available dataset based accuracy and speed tests
- more Lucene based classification algorithms
Specific subtasks for each of the above topics should be created to discuss each of them in depth.