Status: Closed
Resolution: Fixed
below you can see that when I have requested to only output numeric concatenations (not words), some words are still sometimes output, ignoring the options i have provided, and even then, in a very inconsistent way.
assertWdf("Super-Duper-XL500-42-AutoCoder's", 0,0,0,1,0,0,0,0,1, null, new String[] { "42", "AutoCoder" }, new int[] { 18, 21 }, new int[] { 20, 30 }, new int[] { 1, 1 }); assertWdf("Super-Duper-XL500-42-AutoCoder's-56", 0,0,0,1,0,0,0,0,1, null, new String[] { "42", "AutoCoder", "56" }, new int[] { 18, 21, 33 }, new int[] { 20, 30, 35 }, new int[] { 1, 1, 1 }); assertWdf("Super-Duper-XL500-AB-AutoCoder's", 0,0,0,1,0,0,0,0,1, null, new String[] { }, new int[] { }, new int[] { }, new int[] { }); assertWdf("Super-Duper-XL500-42-AutoCoder's-BC", 0,0,0,1,0,0,0,0,1, null, new String[] { "42" }, new int[] { 18 }, new int[] { 20 }, new int[] { 1 });
where assertWdf is
void assertWdf(String text, int generateWordParts, int generateNumberParts, int catenateWords, int catenateNumbers, int catenateAll, int splitOnCaseChange, int preserveOriginal, int splitOnNumerics, int stemEnglishPossessive, CharArraySet protWords, String expected[], int startOffsets[], int endOffsets[], String types[], int posIncs[]) throws IOException { TokenStream ts = new WhitespaceTokenizer(new StringReader(text)); WordDelimiterFilter wdf = new WordDelimiterFilter(ts, generateWordParts, generateNumberParts, catenateWords, catenateNumbers, catenateAll, splitOnCaseChange, preserveOriginal, splitOnNumerics, stemEnglishPossessive, protWords); assertTokenStreamContents(wdf, expected, startOffsets, endOffsets, types, posIncs); }