Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-6818

Implementing Divergence from Independence (DFI) Term-Weighting for Lucene/Solr

    Details

    • Lucene Fields:
      New, Patch Available

      Description

      As explained in the write-up, many state-of-the-art ranking model implementations are added to Apache Lucene.

      This issue aims to include DFI model, which is the non-parametric counterpart of the Divergence from Randomness (DFR) framework.

      DFI is both parameter-free and non-parametric:

      • parameter-free: it does not require any parameter tuning or training.
      • non-parametric: it does not make any assumptions about word frequency distributions on document collections.

      It is highly recommended not to remove stopwords (very common terms: the, of, and, to, a, in, for, is, on, that, etc) with this similarity.

      For more information see: A nonparametric term weighting method for information retrieval based on measuring the divergence from independence

        Attachments

        1. LUCENE-6818.patch
          21 kB
          Ahmet Arslan
        2. LUCENE-6818.patch
          21 kB
          Ahmet Arslan
        3. LUCENE-6818.patch
          23 kB
          Ahmet Arslan
        4. LUCENE-6818.patch
          21 kB
          Ahmet Arslan
        5. LUCENE-6818.patch
          19 kB
          Ahmet Arslan

          Issue Links

            Activity

              People

              • Assignee:
                rcmuir Robert Muir
                Reporter:
                iorixxx Ahmet Arslan
              • Votes:
                0 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: