Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-295

[PATCH] MultiSearcher problems with Similarity.docFreq()

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Patch Available
    • Major
    • Resolution: Fixed
    • 1.4
    • None
    • core/search
    • None
    • Operating System: other
      Platform: All

    • 31841

    Description

      When MultiSearcher invokes its subsearchers, it is the subsearchers' docFreq()
      that is accessed by Similarity.docFreq(). This causes idf's to be computed
      local to each index rather than globally, which causes ranking across multiple
      indices to not be equivalent to ranking across the entire global collection.

      The attached files (if I can figure out how to attach them) provide a potential
      partial solution for this. They properly fix a simple test case, RankingTest,
      that was provided by Daniel Naber.

      The changes are:
      1. Searcher: Add topmostSearcher() field with getter and setter to record
      the outermost Searcher. Default to this.
      2. MultiSearcher: Pass down the topmostSearcher when creating the subsearchers.
      3. IndexSearcher: Call Query.weight() everywhere with the topmostSearcher
      instead of this.
      4. Query: Provide a default implementation of Query.combine() so that
      MultiSearcher works with all queries.

      Problems or possible problems I see:
      1. This does not address the same issue with RemoteSearchable.
      RemoteSearchable is not a Searcher, nor can it be due to lack of multiple
      inheritance in Java, but Query.weight() requires a Searcher. Perhaps
      Query.weight() should be changed to take a Searchable, but this requires
      changing many places and I suspect would break apps.
      2. There may be other places that topmostSearcher should be used instead of this.
      3. The default implementation for Query.combine() is a guess on my part - it
      works for TermQuery. It's fragile in that the default implementation will hide
      bugs caused by queries that inadvertently omit a more precise Query.combine()
      method.
      4. The prior comment on Query.combine() indicates that whoever wrote it was
      fully aware of this problem and so probably had another usage in mind, so the
      whole issue may just be Daniel's usage in the test case. It's not apparent to
      me, so I probably don't understand something.

      Attachments

        1. ASF.LICENSE.NOT.GRANTED--multisearcher.diff
          14 kB
          Wolf Siberski
        2. ASF.LICENSE.NOT.GRANTED--combine-fix2.diff
          13 kB
          Wolf Siberski
        3. ASF.LICENSE.NOT.GRANTED--combine-fix.patch
          5 kB
          Chuck Williams
        4. ASF.LICENSE.NOT.GRANTED--multisearcher-deprecation.diff
          19 kB
          Wolf Siberski
        5. ASF.LICENSE.NOT.GRANTED--multisearcher-deprecation.diff
          20 kB
          Wolf Siberski
        6. ASF.LICENSE.NOT.GRANTED--multisearcher-2005-04-19.diff
          43 kB
          Otis Gospodnetic
        7. ASF.LICENSE.NOT.GRANTED--multisearcher-2005-02-22c.diff
          48 kB
          Wolf Siberski
        8. ASF.LICENSE.NOT.GRANTED--multisearcher-2005-02-18b.diff
          47 kB
          Wolf Siberski
        9. ASF.LICENSE.NOT.GRANTED--multisearcher.diff
          20 kB
          Wolf Siberski
        10. ASF.LICENSE.NOT.GRANTED--multisearcher2.diff
          13 kB
          Wolf Siberski
        11. ASF.LICENSE.NOT.GRANTED--multisearcher.diff
          2 kB
          Wolf Siberski
        12. ASF.LICENSE.NOT.GRANTED--patch.diff
          4 kB
          Daniel Naber
        13. ASF.LICENSE.NOT.GRANTED--MultiSearcherPatch.zip
          7 kB
          Chuck Williams

        Activity

          People

            java-dev@lucene.apache.org Lucene Developers
            chuck@manawiz.com Chuck Williams
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: