Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-6818

Implementing Divergence from Independence (DFI) Term-Weighting for Lucene/Solr

Details

    • New, Patch Available

    Description

      As explained in the write-up, many state-of-the-art ranking model implementations are added to Apache Lucene.

      This issue aims to include DFI model, which is the non-parametric counterpart of the Divergence from Randomness (DFR) framework.

      DFI is both parameter-free and non-parametric:

      • parameter-free: it does not require any parameter tuning or training.
      • non-parametric: it does not make any assumptions about word frequency distributions on document collections.

      It is highly recommended not to remove stopwords (very common terms: the, of, and, to, a, in, for, is, on, that, etc) with this similarity.

      For more information see: A nonparametric term weighting method for information retrieval based on measuring the divergence from independence

      Attachments

        1. LUCENE-6818.patch
          19 kB
          Ahmet Arslan
        2. LUCENE-6818.patch
          21 kB
          Ahmet Arslan
        3. LUCENE-6818.patch
          23 kB
          Ahmet Arslan
        4. LUCENE-6818.patch
          21 kB
          Ahmet Arslan
        5. LUCENE-6818.patch
          21 kB
          Ahmet Arslan

        Issue Links

          Activity

            People

              rcmuir Robert Muir
              iorixxx Ahmet Arslan
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: