Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-4283

Support more frequent skip with Block Postings Format

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Minor
    • Resolution: Later
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      This change works on the new bulk branch.

      Currently, our BlockPostingsFormat only supports skipInterval==blockSize. Every time the skipper reaches the last level 0 skip point, we'll have to decode a whole block to read doc/freq data. Also, a higher level skip list will be created only for those df>blockSize^k, which means for most terms, skipping will just be a linear scan. If we increase current blockSize for better bulk i/o performance, current skip setting will be a bottleneck.

      For ForPF, the encoded block can be easily splitted if we set skipInterval=32*k.

        Attachments

        1. LUCENE-4283-record-skip&inlining-scanning.patch
          13 kB
          Han Jiang
        2. LUCENE-4283-record-next-skip.patch
          6 kB
          Han Jiang
        3. LUCENE-4283-codes-cleanup.patch
          66 kB
          Han Jiang
        4. LUCENE-4283-small-interval-partially.patch
          86 kB
          Han Jiang
        5. LUCENE-4283-small-interval-fully.patch
          81 kB
          Han Jiang
        6. LUCENE-4283-slow.patch
          65 kB
          Han Jiang
        7. LUCENE-4283-buggy.patch
          71 kB
          Han Jiang
        8. LUCENE-4283-buggy.patch
          75 kB
          Han Jiang

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                billy Han Jiang
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: