Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-7304

Doc values based block join implementation

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      At query time the block join relies on a bitset for finding the previous parent doc during advancing the doc id iterator. On large indices these bitsets can consume large amounts of jvm heap space. Also typically due the nature how these bitsets are set, the 'FixedBitSet' implementation is used.

      The idea I had was to replace the bitset usage by a numeric doc values field that stores offsets. Each child doc stores how many docids it is from its parent doc and each parent stores how many docids it is apart from its first child. At query time this information can be used to perform the block join.

      I think another benefit of this approach is that external tools can now easily determine if a doc is part of a block of documents and perhaps this also helps index time sorting?

        Attachments

        1. LUCENE-7304.patch
          25 kB
          Martijn van Groningen
        2. LUCENE-7304.patch
          25 kB
          Martijn van Groningen
        3. LUCENE_7304.patch
          22 kB
          Martijn van Groningen
        4. LUCENE-7304-20160606.patch
          110 kB
          Paul Elschot
        5. LUCENE-7304-20160531.patch
          10 kB
          Paul Elschot
        6. LUCENE-5092-20140313.patch
          25 kB
          Paul Elschot
        7. LUCENE_7304.patch
          17 kB
          Martijn van Groningen

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              martijn.v.groningen Martijn van Groningen
            • Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated: