Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-1158

Scoring, "numDocs" should be number after applying filters, not entire index

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Won't Fix
    • 1.4
    • None
    • search
    • None

    Description

      I'd like to put different types of things to search for in my Solr index. I use a "type" field to discriminate between these types of things, and my "id" primary key field incorporates the type (ex: "FooType:53") to ensure uniqueness. A problem I see with this approach is that the idf (inverse document frequency) component of the score is based on the entire index and not the type that I'm querying. In particular "numDocs" given to the Similarity.java implementation is the total number of documents in the index. I think it would be more accurate for numDocs to be the filtered number of docs. That is the number of docs after the filter queries are applied.

      The only issue I see with this which may or may not be a problem is that the scores (and thus potentially result ordering if sorting by score) would change depending on which filters are applied. That could be counter-intuitive in a faceting UI. Perhaps only a certain filter or filters could be marked as lowering numDocs for scoring. Such a configuration choice strikes me as belonging in the schema.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              dsmiley David Smiley
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: