Umm..., if you don't like indexing and querying in the unit test, where should I place the join test that use NGramTokenizer? It might be nice if we could place that join test in a proper place.
My point is, I don't think the test needs to do any indexing/querying at all to satisfy the change. It adds absolutely nothing to the test and only complicates the matter.
I placed the testIndexAndQuery in the code because the other code like KeywordAnalyzer (in the core) test code has index&query test code in its unit tests.
Just because another does it doesn't make it right.
If we want to tokenize with white space tokenizer, the tokens are
"This", "is", "an", "example"
positions are 0,1,2,3
If we want to tokenize with 2-gram, the tokens are
"Th" "hi" "is" "s " " i" "is" "s " " a" "an" "n " " e" "ex" "xa" "am" "mp" "pl" "le"
positions are 0,1,2,3,4,...
Yes, I understand how it currently works. My question is more along the lines of is this the right way of doing it? I don't know that it is, but it is a bigger question than you and me. I mean, if we are willing to accept that this issue is a bug, then it presents plenty of other problems in terms of position related queries. For example, I think it makes sense to search for "th ex" as a phrase query, but that is not possible do to the positions (at least not w/o a lot of slop)