Following on from https://issues.apache.org/jira/browse/LUCENE-6968, provide a query parser that builds queries that provide a measure of Jaccard similarity. The initial patch includes banded queries that were also proposed on the original issue.
I have one outstanding questions:
- Should the score from the overall query be normalised?
Note, that the band count is currently approximate and may be one less than in practise.