Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-6818

Implementing Divergence from Independence (DFI) Term-Weighting for Lucene/Solr

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Lucene Fields:
      New, Patch Available

      Description

      As explained in the write-up, many state-of-the-art ranking model implementations are added to Apache Lucene.

      This issue aims to include DFI model, which is the non-parametric counterpart of the Divergence from Randomness (DFR) framework.

      DFI is both parameter-free and non-parametric:

      • parameter-free: it does not require any parameter tuning or training.
      • non-parametric: it does not make any assumptions about word frequency distributions on document collections.

      It is highly recommended not to remove stopwords (very common terms: the, of, and, to, a, in, for, is, on, that, etc) with this similarity.

      For more information see: A nonparametric term weighting method for information retrieval based on measuring the divergence from independence

        Attachments

        1. LUCENE-6818.patch
          21 kB
          Ahmet Arslan
        2. LUCENE-6818.patch
          21 kB
          Ahmet Arslan
        3. LUCENE-6818.patch
          23 kB
          Ahmet Arslan
        4. LUCENE-6818.patch
          21 kB
          Ahmet Arslan
        5. LUCENE-6818.patch
          19 kB
          Ahmet Arslan

        Issue Links

          Activity

            People

            • Assignee:
              rcmuir Robert Muir
              Reporter:
              iorixxx Ahmet Arslan

              Dates

              • Created:
                Updated:
                Resolved:

                Issue deployment