Description
Currently, WordDelimiterFilter doesn't try to set the posLen attribute and so it creates graphs like this:
but with this patch (still a work in progress) it creates this graph instead:
This means (today) positional queries when using WDF at search time are buggy, but since we fixed LUCENE-7603, with this change here you should be able to use positional queries with WDGF.
I'm also trying to produce holes properly (removes logic from the current WDF that swallows a hole when whole token is just delimiters).
Surprisingly, it's actually quite easy to tweak WDF to create a graph (unlike e.g. SynonymGraphFilter) because it's already creating the necessary new positions, and its output graph never has side paths, except for single tokens that skip nodes because they have posLen > 1. I.e. the only fix to make, I think, is to set posLen properly. And it really helps that it does its own "new token buffering + sorting" already.