Lucene - Core
  1. Lucene - Core
  2. LUCENE-1543

Field specified norms in MatchAllDocumentsScorer

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 2.4
    • Fix Version/s: 2.9
    • Component/s: core/query/scoring
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      This patch allows for optionally setting a field to use for norms factoring when scoring a MatchingAllDocumentsQuery.

      From the test case:

      .
          RAMDirectory dir = new RAMDirectory();
          IndexWriter iw = new IndexWriter(dir, new StandardAnalyzer(), true, IndexWriter.MaxFieldLength.LIMITED);
          iw.setMaxBufferedDocs(2);  // force multi-segment
          addDoc("one", iw, 1f);
          addDoc("two", iw, 20f);
          addDoc("three four", iw, 300f);
          iw.close();
      
          IndexReader ir = IndexReader.open(dir);
          IndexSearcher is = new IndexSearcher(ir);
          ScoreDoc[] hits;
      
          // assert with norms scoring turned off
      
          hits = is.search(new MatchAllDocsQuery(), null, 1000).scoreDocs;
          assertEquals(3, hits.length);
          assertEquals("one", ir.document(hits[0].doc).get("key"));
          assertEquals("two", ir.document(hits[1].doc).get("key"));
          assertEquals("three four", ir.document(hits[2].doc).get("key"));
      
          // assert with norms scoring turned on
      
          MatchAllDocsQuery normsQuery = new MatchAllDocsQuery("key");
          assertEquals(3, hits.length);
      //    is.explain(normsQuery, hits[0].doc);
          hits = is.search(normsQuery, null, 1000).scoreDocs;
      
          assertEquals("three four", ir.document(hits[0].doc).get("key"));    
          assertEquals("two", ir.document(hits[1].doc).get("key"));
          assertEquals("one", ir.document(hits[2].doc).get("key"));
      

        Activity

        Hide
        Yonik Seeley added a comment -

        Couldn't you just use a TermQuery?
        Or a BooleanQuery with a MatchAllDocsQuery and an optional TermQuery?

        Show
        Yonik Seeley added a comment - Couldn't you just use a TermQuery? Or a BooleanQuery with a MatchAllDocsQuery and an optional TermQuery?
        Hide
        Karl Wettin added a comment -

        Couldn't you just use a TermQuery? Or a BooleanQuery with a MatchAllDocsQuery and an optional TermQuery?

        Wouldn't that require a TermQuery that match all documents? I.e. adding a term to a field in all documents?

        The following stuff doesn't really fit in this issue, but still. It's rather related to column stride payloads LUCENE-1231 . I've been considering adding a new "norms" field at document level for a couple of years now. 8 more bits at document level would allow for moving general document boosting to move it out the norms-boost-per-field-blob and increase the length normalization and per field boost resolution quite a bit at a low cost.

        (I hope that is not yet another can of worms I get to open.)

        Show
        Karl Wettin added a comment - Couldn't you just use a TermQuery? Or a BooleanQuery with a MatchAllDocsQuery and an optional TermQuery? Wouldn't that require a TermQuery that match all documents? I.e. adding a term to a field in all documents? The following stuff doesn't really fit in this issue, but still. It's rather related to column stride payloads LUCENE-1231 . I've been considering adding a new "norms" field at document level for a couple of years now. 8 more bits at document level would allow for moving general document boosting to move it out the norms-boost-per-field-blob and increase the length normalization and per field boost resolution quite a bit at a low cost. (I hope that is not yet another can of worms I get to open.)
        Hide
        Michael McCandless added a comment -

        It seems like this is quite similar to function queries, which also
        match all docs but then let you to set your own score for each doc (eg
        based on values from FieldCache).

        Once we create column-stride fields, and merge norms into it, then
        presumably MatchAllDocsQuery & function queries would simply be the
        same thing.

        I've been considering adding a new "norms" field at document level for a couple of years now. 8 more bits at document level would allow for moving general document boosting to move it out the norms-boost-per-field-blob and increase the length normalization and per field boost resolution quite a bit at a low cost.

        This seems interesting – it would double the precision for boosting,
        but would require the equivalent of one more field's norms enabled of
        RAM storage (ie a byte[] of length maxDoc()). Also, it would slow
        down scoring to have to lookup & multiply in doc's contributation, and
        the field's. I don't have a good sense of how often the added
        precision is helpful though. Karl have you tested that? EG using
        function queries you could easily emulate "per-document norms".

        Show
        Michael McCandless added a comment - It seems like this is quite similar to function queries, which also match all docs but then let you to set your own score for each doc (eg based on values from FieldCache). Once we create column-stride fields, and merge norms into it, then presumably MatchAllDocsQuery & function queries would simply be the same thing. I've been considering adding a new "norms" field at document level for a couple of years now. 8 more bits at document level would allow for moving general document boosting to move it out the norms-boost-per-field-blob and increase the length normalization and per field boost resolution quite a bit at a low cost. This seems interesting – it would double the precision for boosting, but would require the equivalent of one more field's norms enabled of RAM storage (ie a byte[] of length maxDoc()). Also, it would slow down scoring to have to lookup & multiply in doc's contributation, and the field's. I don't have a good sense of how often the added precision is helpful though. Karl have you tested that? EG using function queries you could easily emulate "per-document norms".
        Hide
        Michael McCandless added a comment -

        Karl, is there a reason why a function query can't be used in your situation? It seems like it should work?

        Show
        Michael McCandless added a comment - Karl, is there a reason why a function query can't be used in your situation? It seems like it should work?
        Hide
        Karl Wettin added a comment -

        Karl, is there a reason why a function query can't be used in your situation? It seems like it should work?

        I'm sure it would. : )

        I do however not understand why you think it is a more correct/nice/better/what not solution than to use this patch. This is how I reason: if the feature of norms scoring is available in all other low level queries, than it also makes sense to have it in the low level MatchAllDocumentsQuery

        Show
        Karl Wettin added a comment - Karl, is there a reason why a function query can't be used in your situation? It seems like it should work? I'm sure it would. : ) I do however not understand why you think it is a more correct/nice/better/what not solution than to use this patch. This is how I reason: if the feature of norms scoring is available in all other low level queries, than it also makes sense to have it in the low level MatchAllDocumentsQuery
        Hide
        Michael McCandless added a comment -

        This is how I reason: if the feature of norms scoring is available in all other low level queries, than it also makes sense to have it in the low level MatchAllDocumentsQuery

        OK I agree. Since the index would already have norms for the field,
        it makes sense to provide a way to tap solely those norms as the basis
        for scoring.

        Show
        Michael McCandless added a comment - This is how I reason: if the feature of norms scoring is available in all other low level queries, than it also makes sense to have it in the low level MatchAllDocumentsQuery OK I agree. Since the index would already have norms for the field, it makes sense to provide a way to tap solely those norms as the basis for scoring.
        Hide
        Michael McCandless added a comment -

        Thanks Karl!

        Show
        Michael McCandless added a comment - Thanks Karl!

          People

          • Assignee:
            Michael McCandless
            Reporter:
            Karl Wettin
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development