Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.5, 6.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Since LUCENE-6818 we have DFISimilarity which implements normalized chi-squared distance.

      But there are other alternatives (as described in http://trec.nist.gov/pubs/trec21/papers/irra.web.nb.pdf):

      • normalized chi-squared: "can be used for tasks that require high precision, against both short and long queries"
      • standardized: "good at tasks that require high recall and high precision, especially against short queries composed of a few words as in the case of Internet searches"
      • saturated: "for tasks that require high recall against long queries"

      I think we should just provide the three independence measures, and let the user choose. Similar to how we do DFR/IB/etc.

        Attachments

        1. LUCENE-6986.patch
          18 kB
          Robert Muir
        2. LUCENE-6986.patch
          18 kB
          Robert Muir
        3. LUCENE-6986.patch
          19 kB
          Robert Muir

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              rcmuir Robert Muir
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: