Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-4771

Query-time join collectors could maybe be more efficient

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: modules/join
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      I was looking @ these collectors on LUCENE-4765 and I noticed:

      • SingleValued collector (SV) pulls FieldCache.getTerms and adds the bytes to a bytesrefhash per-collect.
      • MultiValued collector (MV) pulls FieldCache.getDocTermsOrds, but doesnt use the ords, just looks up each value and adds the bytes per-collect.

      I think instead its worth investigating if SV should use getTermsIndex, and both collectors just collect-up their per-segment ords in something like a BitSet[maxOrd].

      When asked for the terms at the end in getCollectorTerms(), they could merge these into one BytesRefHash.

      Of course, if you are going to turn around and execute the query against the same searcher anyway (is this the typical case?), this could even be more efficient: No need to hash or instantiate all the terms in memory, we could do postpone the lookups to SeekingTermSetTermsEnum.accept()/nextSeekTerm() i think... somehow

        Attachments

        1. LUCENE-4771-prototype.patch
          18 kB
          Martijn van Groningen
        2. LUCENE-4771_prototype.patch
          15 kB
          Robert Muir
        3. LUCENE-4771_prototype_without_bug.patch
          15 kB
          Robert Muir

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              rcmuir Robert Muir
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated: