Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-6986

Add more DFI independence measures

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 5.5, 6.0
    • None
    • None
    • New

    Description

      Since LUCENE-6818 we have DFISimilarity which implements normalized chi-squared distance.

      But there are other alternatives (as described in http://trec.nist.gov/pubs/trec21/papers/irra.web.nb.pdf):

      • normalized chi-squared: "can be used for tasks that require high precision, against both short and long queries"
      • standardized: "good at tasks that require high recall and high precision, especially against short queries composed of a few words as in the case of Internet searches"
      • saturated: "for tasks that require high recall against long queries"

      I think we should just provide the three independence measures, and let the user choose. Similar to how we do DFR/IB/etc.

      Attachments

        1. LUCENE-6986.patch
          19 kB
          Robert Muir
        2. LUCENE-6986.patch
          18 kB
          Robert Muir
        3. LUCENE-6986.patch
          18 kB
          Robert Muir

        Activity

          People

            Unassigned Unassigned
            rcmuir Robert Muir
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: