Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-6818

Implementing Divergence from Independence (DFI) Term-Weighting for Lucene/Solr

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • New, Patch Available

    Description

      As explained in the write-up, many state-of-the-art ranking model implementations are added to Apache Lucene.

      This issue aims to include DFI model, which is the non-parametric counterpart of the Divergence from Randomness (DFR) framework.

      DFI is both parameter-free and non-parametric:

      • parameter-free: it does not require any parameter tuning or training.
      • non-parametric: it does not make any assumptions about word frequency distributions on document collections.

      It is highly recommended not to remove stopwords (very common terms: the, of, and, to, a, in, for, is, on, that, etc) with this similarity.

      For more information see: A nonparametric term weighting method for information retrieval based on measuring the divergence from independence

      Attachments

        1. LUCENE-6818.patch
          19 kB
          Ahmet Arslan
        2. LUCENE-6818.patch
          21 kB
          Ahmet Arslan
        3. LUCENE-6818.patch
          23 kB
          Ahmet Arslan
        4. LUCENE-6818.patch
          21 kB
          Ahmet Arslan
        5. LUCENE-6818.patch
          21 kB
          Ahmet Arslan

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            rcmuir Robert Muir
            iorixxx Ahmet Arslan
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment