Solr
  1. Solr
  2. SOLR-2950

QueryElevationComponent needlessly looks up document ids

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.0-ALPHA
    • Component/s: None
    • Labels:
      None

      Description

      The QueryElevationComponent needlessly instantiates a FieldCache and does look ups in it for every document. If we flipped things around a bit and got Lucene internal doc ids on inform() we could then simply do a much smaller and faster lookup during the sort.

      1. SOLR-2950.patch
        3 kB
        Yonik Seeley
      2. SOLR-2950.patch
        65 kB
        Grant Ingersoll
      3. SOLR-2950.patch
        65 kB
        Grant Ingersoll
      4. SOLR-2950.patch
        77 kB
        Grant Ingersoll

        Issue Links

          Activity

          Hide
          Yonik Seeley added a comment -

          Instead of doing everything on inform (which isn't great for NRT), we should just do it on-demand in the comparator in setNextReader() for only those uniqueKeys that were boosted)

          We could cache the uniqueKey -> docid across queries, but not sure it's worth it at this point (assuming at most a handful of docs are boosted per-query). And if we did want some sort of uniqueKey -> docid cache it would make most sense to be an internal cache in SolrIndexSearcher, not private to the QEC.

          Show
          Yonik Seeley added a comment - Instead of doing everything on inform (which isn't great for NRT), we should just do it on-demand in the comparator in setNextReader() for only those uniqueKeys that were boosted) We could cache the uniqueKey -> docid across queries, but not sure it's worth it at this point (assuming at most a handful of docs are boosted per-query). And if we did want some sort of uniqueKey -> docid cache it would make most sense to be an internal cache in SolrIndexSearcher, not private to the QEC.
          Hide
          Yonik Seeley added a comment -

          It would probably be most performant to do the lookup perSegment (i.e. in setNextReader) and remove documents as they are found (i.e. if the doc exists in segment1, don't bother looking it up in further segments). This will also mean that we only do hash lookups in the SentinelIntSet when there actually exists a boosted doc in the segment.

          Show
          Yonik Seeley added a comment - It would probably be most performant to do the lookup perSegment (i.e. in setNextReader) and remove documents as they are found (i.e. if the doc exists in segment1, don't bother looking it up in further segments). This will also mean that we only do hash lookups in the SentinelIntSet when there actually exists a boosted doc in the segment.
          Hide
          Grant Ingersoll added a comment -

          First draft. This just does the mapping in the constructor to the comparator. We could do setNextReader, but I can't imagine it will really make much difference given the small number of items that we typically would expect to be elevated.

          Show
          Grant Ingersoll added a comment - First draft. This just does the mapping in the constructor to the comparator. We could do setNextReader, but I can't imagine it will really make much difference given the small number of items that we typically would expect to be elevated.
          Hide
          Grant Ingersoll added a comment -

          Note, I'm going to try the way Yonik suggested too, but wanted to put this up as a first draft.

          Show
          Grant Ingersoll added a comment - Note, I'm going to try the way Yonik suggested too, but wanted to put this up as a first draft.
          Hide
          Grant Ingersoll added a comment -

          Also, I need to double check my comparator understanding b/c perhaps the doc ids are off due to getting a top level reader.

          Show
          Grant Ingersoll added a comment - Also, I need to double check my comparator understanding b/c perhaps the doc ids are off due to getting a top level reader.
          Hide
          Yonik Seeley added a comment -

          I can't imagine it will really make much difference given the small number of items that we typically would expect to be elevated.

          The fact that it will be a small number of elevated docs is entirely my point - that means that if we do it per segment, that there will normally be no documents elevated in a specific segment and the hash lookup can be skipped (and that would be a sizeable gain for something simple like a term query). You're right about small sets - it doesn't matter if the set size is 1 or 10 if you do need to do the lookup.

          Show
          Yonik Seeley added a comment - I can't imagine it will really make much difference given the small number of items that we typically would expect to be elevated. The fact that it will be a small number of elevated docs is entirely my point - that means that if we do it per segment, that there will normally be no documents elevated in a specific segment and the hash lookup can be skipped (and that would be a sizeable gain for something simple like a term query). You're right about small sets - it doesn't matter if the set size is 1 or 10 if you do need to do the lookup.
          Hide
          Grant Ingersoll added a comment -

          This patch moves the work to setNextReader

          Show
          Grant Ingersoll added a comment - This patch moves the work to setNextReader
          Hide
          Grant Ingersoll added a comment -

          Minor cleanup. I think this is ready to go and will likely commit later today or tomorrow.

          Show
          Grant Ingersoll added a comment - Minor cleanup. I think this is ready to go and will likely commit later today or tomorrow.
          Hide
          Yonik Seeley added a comment -

          OK, just had a chance to view the comparator part of this patch.

          Here's a patch that fixes

          • minor check-for-null for fields() and terms() which can return null
          • even though docsEnum returns something, it may be deleted (i.e. need to check for NO_MORE_DOCS)
          • use liveDocs when requesting the docsEnum so we won't use a deleted (overwritten) doc.

          The last two issues would both cause us to miss elevated documents if they have been updated and an old deleted version still exists in the index.

          Show
          Yonik Seeley added a comment - OK, just had a chance to view the comparator part of this patch. Here's a patch that fixes minor check-for-null for fields() and terms() which can return null even though docsEnum returns something, it may be deleted (i.e. need to check for NO_MORE_DOCS) use liveDocs when requesting the docsEnum so we won't use a deleted (overwritten) doc. The last two issues would both cause us to miss elevated documents if they have been updated and an old deleted version still exists in the index.
          Hide
          Grant Ingersoll added a comment -

          +1, go ahead and commit.

          Show
          Grant Ingersoll added a comment - +1, go ahead and commit.

            People

            • Assignee:
              Grant Ingersoll
              Reporter:
              Grant Ingersoll
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development