Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-4283

Support more frequent skip with Block Postings Format

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Later
    • None
    • None
    • None
    • None
    • New

    Description

      This change works on the new bulk branch.

      Currently, our BlockPostingsFormat only supports skipInterval==blockSize. Every time the skipper reaches the last level 0 skip point, we'll have to decode a whole block to read doc/freq data. Also, a higher level skip list will be created only for those df>blockSize^k, which means for most terms, skipping will just be a linear scan. If we increase current blockSize for better bulk i/o performance, current skip setting will be a bottleneck.

      For ForPF, the encoded block can be easily splitted if we set skipInterval=32*k.

      Attachments

        1. LUCENE-4283-buggy.patch
          75 kB
          Han Jiang
        2. LUCENE-4283-buggy.patch
          71 kB
          Han Jiang
        3. LUCENE-4283-slow.patch
          65 kB
          Han Jiang
        4. LUCENE-4283-small-interval-fully.patch
          81 kB
          Han Jiang
        5. LUCENE-4283-small-interval-partially.patch
          86 kB
          Han Jiang
        6. LUCENE-4283-codes-cleanup.patch
          66 kB
          Han Jiang
        7. LUCENE-4283-record-next-skip.patch
          6 kB
          Han Jiang
        8. LUCENE-4283-record-skip&inlining-scanning.patch
          13 kB
          Han Jiang

        Issue Links

          Activity

            People

              Unassigned Unassigned
              billy Han Jiang
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: