here's a test.
the problem is a previous filter 'lengthens' this term by folding æ -> ae, but EdgeNGramFilter computes the offsets "additively": offsetAtt.setOffset(tokStart + start, tokStart + end);
Because of this if a word has been 'lengthened' by a previous filter, edgengram will produce offsets that are longer than the original text. (and probably bogus ones if its been shortened).
I think we should what WDF does here, if the original offsets have already been changed (startOffset + termLength != endOffset), then we should simply preserve them for the new subwords.
I added a check for this to basetokenstreamtestcase... now to see if anything else fails...