Solr
  1. Solr
  2. SOLR-5624

Enable QueryResultCache for CollapsingQParserPlugin

    Details

    • Type: Task Task
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 4.6
    • Fix Version/s: 4.7, 6.0
    • Component/s: None
    • Labels:
      None
    1. SOLR-5624.patch
      7 kB
      Joel Bernstein
    2. SOLR-5624.patch
      6 kB
      Joel Bernstein
    3. SOLR-5624.patch
      3 kB
      Joel Bernstein

      Issue Links

        Activity

        Hide
        Joel Bernstein added a comment -

        Initial patch created from trunk, lightly tested. Looks good.

        Show
        Joel Bernstein added a comment - Initial patch created from trunk, lightly tested. Looks good.
        Hide
        Joel Bernstein added a comment -

        New patch. This patch provides logic that will cause caching to fail when elevated docs are present.

        Show
        Joel Bernstein added a comment - New patch. This patch provides logic that will cause caching to fail when elevated docs are present.
        Hide
        Joel Bernstein added a comment -

        Added new patch just expanding one of the tests.

        Show
        Joel Bernstein added a comment - Added new patch just expanding one of the tests.
        Hide
        ASF subversion and git services added a comment -

        Commit 1566071 from Joel Bernstein in branch 'dev/trunk'
        [ https://svn.apache.org/r1566071 ]

        SOLR-5624: Enable QueryResultCache for CollapsingQParserPlugin

        Show
        ASF subversion and git services added a comment - Commit 1566071 from Joel Bernstein in branch 'dev/trunk' [ https://svn.apache.org/r1566071 ] SOLR-5624 : Enable QueryResultCache for CollapsingQParserPlugin
        Hide
        ASF subversion and git services added a comment -

        Commit 1566122 from Joel Bernstein in branch 'dev/branches/branch_4x'
        [ https://svn.apache.org/r1566122 ]

        SOLR-5624: Enable QueryResultCache for CollapsingQParserPlugin

        Show
        ASF subversion and git services added a comment - Commit 1566122 from Joel Bernstein in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1566122 ] SOLR-5624 : Enable QueryResultCache for CollapsingQParserPlugin
        Hide
        ASF subversion and git services added a comment -

        Commit 1566309 from Joel Bernstein in branch 'dev/trunk'
        [ https://svn.apache.org/r1566309 ]

        SOLR-5624: Guard against NPE during cache warming

        Show
        ASF subversion and git services added a comment - Commit 1566309 from Joel Bernstein in branch 'dev/trunk' [ https://svn.apache.org/r1566309 ] SOLR-5624 : Guard against NPE during cache warming
        Hide
        ASF subversion and git services added a comment -

        Commit 1566312 from Joel Bernstein in branch 'dev/branches/branch_4x'
        [ https://svn.apache.org/r1566312 ]

        SOLR-5624: Guard against NPE during cache warming

        Show
        ASF subversion and git services added a comment - Commit 1566312 from Joel Bernstein in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1566312 ] SOLR-5624 : Guard against NPE during cache warming
        Hide
        ASF subversion and git services added a comment -

        Commit 1567640 from Joel Bernstein in branch 'dev/trunk'
        [ https://svn.apache.org/r1567640 ]

        SOLR-5624: check for elevated documents in hashCode()

        Show
        ASF subversion and git services added a comment - Commit 1567640 from Joel Bernstein in branch 'dev/trunk' [ https://svn.apache.org/r1567640 ] SOLR-5624 : check for elevated documents in hashCode()
        Hide
        Joel Bernstein added a comment -

        The previous commit checks for elevated documents in the query's hashCode() method. This needs to be done because elevated documents do not appear in the request context until after the query is constructed. This is because the QueryElevationComponent adds the elevated documents to the request context after the original query is constructed.

        When checking for the query in the QueryResultCache, we need to take into account the presence of elevated documents. The check that was added to the hashCode() method does this.

        We still need to look for elevated docs in the getFilterCollector() method in case caching was turned off.

        Show
        Joel Bernstein added a comment - The previous commit checks for elevated documents in the query's hashCode() method. This needs to be done because elevated documents do not appear in the request context until after the query is constructed. This is because the QueryElevationComponent adds the elevated documents to the request context after the original query is constructed. When checking for the query in the QueryResultCache, we need to take into account the presence of elevated documents. The check that was added to the hashCode() method does this. We still need to look for elevated docs in the getFilterCollector() method in case caching was turned off.
        Hide
        ASF subversion and git services added a comment -

        Commit 1567649 from Joel Bernstein in branch 'dev/branches/branch_4x'
        [ https://svn.apache.org/r1567649 ]

        SOLR-5624: check for elevated documents in hashCode()

        Show
        ASF subversion and git services added a comment - Commit 1567649 from Joel Bernstein in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1567649 ] SOLR-5624 : check for elevated documents in hashCode()
        Hide
        David Boychuck added a comment -

        Hi Joel,

        I sent you an email but I'm not sure if you received it or not. I ran into a bit of trouble using the CollapsingQParserPlugin with elevated documents. To explain it simply, I want to exclude grouped documents when one of the members of the group are contained in the elevated document set. I'm not sure this is possible currently because as you explain above elevated documents are added to the request context after the original query is constructed.

        To try to better illustrate the problem. If I have 2 documents docid=1 and docid=2 and both have a groupid of 'a'. If a grouped query scores docid 2 first in the results but I have elevated docid 1 then both documents are shown in the results when I really only want the elevated document to be shown in the results.

        Is this something that would be difficult to implement? Any help is appreciated.

        Show
        David Boychuck added a comment - Hi Joel, I sent you an email but I'm not sure if you received it or not. I ran into a bit of trouble using the CollapsingQParserPlugin with elevated documents. To explain it simply, I want to exclude grouped documents when one of the members of the group are contained in the elevated document set. I'm not sure this is possible currently because as you explain above elevated documents are added to the request context after the original query is constructed. To try to better illustrate the problem. If I have 2 documents docid=1 and docid=2 and both have a groupid of 'a'. If a grouped query scores docid 2 first in the results but I have elevated docid 1 then both documents are shown in the results when I really only want the elevated document to be shown in the results. Is this something that would be difficult to implement? Any help is appreciated.
        Hide
        David Boychuck added a comment -

        I think the solution would be to remove the documents from liveDocs that share the same groupid in the getBoostDocs() function. Let me know if this makes any sense. I'll continue working towards a solution in the meantime.

        private IntOpenHashSet getBoostDocs(SolrIndexSearcher indexSearcher, Set<String> boosted) throws IOException {
              IntOpenHashSet boostDocs = null;
              if(boosted != null) {
                SchemaField idField = indexSearcher.getSchema().getUniqueKeyField();
                String fieldName = idField.getName();
                HashSet<BytesRef> localBoosts = new HashSet(boosted.size()*2);
                Iterator<String> boostedIt = boosted.iterator();
                while(boostedIt.hasNext()) {
                  localBoosts.add(new BytesRef(boostedIt.next()));
                }
        
                boostDocs = new IntOpenHashSet(boosted.size()*2);
        
                List<AtomicReaderContext>leaves = indexSearcher.getTopReaderContext().leaves();
                TermsEnum termsEnum = null;
                DocsEnum docsEnum = null;
                for(AtomicReaderContext leaf : leaves) {
                  AtomicReader reader = leaf.reader();
                  int docBase = leaf.docBase;
                  Bits liveDocs = reader.getLiveDocs();
                  Terms terms = reader.terms(fieldName);
                  termsEnum = terms.iterator(termsEnum);
                  Iterator<BytesRef> it = localBoosts.iterator();
                  while(it.hasNext()) {
                    BytesRef ref = it.next();
                    if(termsEnum.seekExact(ref)) {
                      docsEnum = termsEnum.docs(liveDocs, docsEnum);
                      int doc = docsEnum.nextDoc();
                      if(doc != -1) {
                        //Found the document.
                        boostDocs.add(doc+docBase);
        
                       *// HERE REMOVE ANY DOCUMENTS THAT SHARE THE GROUPID NOT ONLY THE DOCID //*
                        it.remove();
        
        
        
                      }
                    }
                  }
                }
              }
        
              return boostDocs;
            }
        
        Show
        David Boychuck added a comment - I think the solution would be to remove the documents from liveDocs that share the same groupid in the getBoostDocs() function. Let me know if this makes any sense. I'll continue working towards a solution in the meantime. private IntOpenHashSet getBoostDocs(SolrIndexSearcher indexSearcher, Set< String > boosted) throws IOException { IntOpenHashSet boostDocs = null ; if (boosted != null ) { SchemaField idField = indexSearcher.getSchema().getUniqueKeyField(); String fieldName = idField.getName(); HashSet<BytesRef> localBoosts = new HashSet(boosted.size()*2); Iterator< String > boostedIt = boosted.iterator(); while (boostedIt.hasNext()) { localBoosts.add( new BytesRef(boostedIt.next())); } boostDocs = new IntOpenHashSet(boosted.size()*2); List<AtomicReaderContext>leaves = indexSearcher.getTopReaderContext().leaves(); TermsEnum termsEnum = null ; DocsEnum docsEnum = null ; for (AtomicReaderContext leaf : leaves) { AtomicReader reader = leaf.reader(); int docBase = leaf.docBase; Bits liveDocs = reader.getLiveDocs(); Terms terms = reader.terms(fieldName); termsEnum = terms.iterator(termsEnum); Iterator<BytesRef> it = localBoosts.iterator(); while (it.hasNext()) { BytesRef ref = it.next(); if (termsEnum.seekExact(ref)) { docsEnum = termsEnum.docs(liveDocs, docsEnum); int doc = docsEnum.nextDoc(); if (doc != -1) { //Found the document. boostDocs.add(doc+docBase); * // HERE REMOVE ANY DOCUMENTS THAT SHARE THE GROUPID NOT ONLY THE DOCID //* it.remove(); } } } } } return boostDocs; }
        Hide
        Hoss Man added a comment -

        David Boychuck: please post your question to the solr-user list, or (if you have an improvement you'd like to contribe) open a new Jira issue to track it.

        posting a comment in a resolve issue like this is almost certain to get lost and overlooked.

        Show
        Hoss Man added a comment - David Boychuck : please post your question to the solr-user list, or (if you have an improvement you'd like to contribe) open a new Jira issue to track it. posting a comment in a resolve issue like this is almost certain to get lost and overlooked.

          People

          • Assignee:
            Joel Bernstein
            Reporter:
            David Boychuck
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development