Details
Description
This is related to OAK-3276 where the intent was to use StandardAnalyzer by default (instead of OakAnalyzer). As discussed there, we need specific word delimiter which isn't possible with StandardAnalyzer, so we instead should switch over to StandardTokenizer in OakAnalyer itself.
A few motivations to do that:
- Better unicode support
- ClassicTokenizer is the old (~lucene 3.1) implementation of standard tokenizer
One of the key difference between classic and standard tokenizer is the way they delimit words (standard analyzer follows unicode text segmentation rules)... but that difference gets nullified as we have our own WordDelimiterFilter.