Lucene - Core
  1. Lucene - Core
  2. LUCENE-5103

join on single-valued field with deleted docs scores too few docs

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 4.3.1
    • Fix Version/s: 4.4
    • Component/s: modules/join
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      TermsIncludingScoreQuery has an inner class SVInnerScorer used when the "to" side of a join is single-valued. This has a nextDocOutOfOrder() method that is faulty when there are deleted documents, and a document that is deleted is matched by the join. It'll terminate with NO_MORE_DOCS prematurely. Interestingly, it appears MVInnerScorer (multi-valued) was coded properly to not have this problem.

        Activity

        Hide
        David Smiley added a comment -

        A working implementation of the method should looks like this:

            int nextDocOutOfOrder() throws IOException {
        
              while (true) {
                if (docsEnum != null) {
                  int docId = docsEnum.nextDoc();
                  if (docId == DocIdSetIterator.NO_MORE_DOCS) {
                    docsEnum = null;
                  } else {
                    return doc = docId;
                  }
                }
        
                if (upto == terms.size()) {
                  return doc = DocIdSetIterator.NO_MORE_DOCS;
                }
        
                scoreUpto = upto;
                if (termsEnum.seekExact(terms.get(ords[upto++], spare), true)) {
                  docsEnum = reuse = termsEnum.docs(acceptDocs, reuse, DocsEnum.FLAG_NONE);
                }
              }
        
            }
        

        I'll code a proper patch another day as it's very late right now.

        Show
        David Smiley added a comment - A working implementation of the method should looks like this: int nextDocOutOfOrder() throws IOException { while ( true ) { if (docsEnum != null ) { int docId = docsEnum.nextDoc(); if (docId == DocIdSetIterator.NO_MORE_DOCS) { docsEnum = null ; } else { return doc = docId; } } if (upto == terms.size()) { return doc = DocIdSetIterator.NO_MORE_DOCS; } scoreUpto = upto; if (termsEnum.seekExact(terms.get(ords[upto++], spare), true )) { docsEnum = reuse = termsEnum.docs(acceptDocs, reuse, DocsEnum.FLAG_NONE); } } } I'll code a proper patch another day as it's very late right now.
        Hide
        David Smiley added a comment -

        This patch fixes the bug.

        It also refactors MVInnerScorer so that it overrides a new method that has the essence of the delta needed over SVInnerScorer – its superclass.

        Show
        David Smiley added a comment - This patch fixes the bug. It also refactors MVInnerScorer so that it overrides a new method that has the essence of the delta needed over SVInnerScorer – its superclass.
        Hide
        ASF subversion and git services added a comment -

        Commit 1502784 from David Smiley
        [ https://svn.apache.org/r1502784 ]

        LUCENE-5103: A join on A single-valued field with deleted docs scored too few docs. Did a little refactoring of the inner scorers too.

        Show
        ASF subversion and git services added a comment - Commit 1502784 from David Smiley [ https://svn.apache.org/r1502784 ] LUCENE-5103 : A join on A single-valued field with deleted docs scored too few docs. Did a little refactoring of the inner scorers too.
        Hide
        ASF subversion and git services added a comment -

        Commit 1502785 from David Smiley
        [ https://svn.apache.org/r1502785 ]

        LUCENE-5103: A join on A single-valued field with deleted docs scored too few docs. Did a little refactoring of the inner scorers too.

        Show
        ASF subversion and git services added a comment - Commit 1502785 from David Smiley [ https://svn.apache.org/r1502785 ] LUCENE-5103 : A join on A single-valued field with deleted docs scored too few docs. Did a little refactoring of the inner scorers too.
        Hide
        Robert Muir added a comment -

        This wasn't actually fixed on the 4.4 release branch.

        Show
        Robert Muir added a comment - This wasn't actually fixed on the 4.4 release branch.
        Hide
        David Smiley added a comment -

        Oh; I didn't notice the release branch. I'll merge later today when I have time; same with SOLR-5034.

        Show
        David Smiley added a comment - Oh; I didn't notice the release branch. I'll merge later today when I have time; same with SOLR-5034 .
        Hide
        ASF subversion and git services added a comment -

        Commit 1502832 from David Smiley
        [ https://svn.apache.org/r1502832 ]

        LUCENE-5103: A join on A single-valued field with deleted docs scored too few docs. Did a little refactoring of the inner scorers too.

        Show
        ASF subversion and git services added a comment - Commit 1502832 from David Smiley [ https://svn.apache.org/r1502832 ] LUCENE-5103 : A join on A single-valued field with deleted docs scored too few docs. Did a little refactoring of the inner scorers too.
        Hide
        Martijn van Groningen added a comment -

        @David Thanks for fixing this!

        Show
        Martijn van Groningen added a comment - @David Thanks for fixing this!

          People

          • Assignee:
            David Smiley
            Reporter:
            David Smiley
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development