As explained in the write-up, many state-of-the-art ranking model implementations are added to Apache Lucene.
This issue aims to include DFI model, which is the non-parametric counterpart of the Divergence from Randomness (DFR) framework.
DFI is both parameter-free and non-parametric:
- parameter-free: it does not require any parameter tuning or training.
- non-parametric: it does not make any assumptions about word frequency distributions on document collections.
It is highly recommended not to remove stopwords (very common terms: the, of, and, to, a, in, for, is, on, that, etc) with this similarity.