Lucene - Core
  1. Lucene - Core
  2. LUCENE-2723

Speed up Lucene's low level bulk postings read API

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Won't Fix
    • Affects Version/s: None
    • Fix Version/s: 4.1
    • Component/s: core/index
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Spinoff from LUCENE-1410.

      The flex DocsEnum has a simple bulk-read API that reads the next chunk
      of docs/freqs. But it's a poor fit for intblock codecs like FOR/PFOR
      (from LUCENE-1410). This is not unlike sucking coffee through those
      tiny plastic coffee stirrers they hand out airplanes that,
      surprisingly, also happen to function as a straw.

      As a result we see no perf gain from using FOR/PFOR.

      I had hacked up a fix for this, described at in my blog post at
      http://chbits.blogspot.com/2010/08/lucene-performance-with-pfordelta-codec.html

      I'm opening this issue to get that work to a committable point.

      So... I've worked out a new bulk-read API to address performance
      bottleneck. It has some big changes over the current bulk-read API:

      • You can now also bulk-read positions (but not payloads), but, I
        have yet to cutover positional queries.
      • The buffer contains doc deltas, not absolute values, for docIDs
        and positions (freqs are absolute).
      • Deleted docs are not filtered out.
      • The doc & freq buffers need not be "aligned". For fixed intblock
        codecs (FOR/PFOR) they will be, but for varint codecs (Simple9/16,
        Group varint, etc.) they won't be.

      It's still a work in progress...

      1. LUCENE-2723-termscorer.patch
        21 kB
        Simon Willnauer
      2. LUCENE-2723-termscorer.patch
        20 kB
        Simon Willnauer
      3. LUCENE-2723-termscorer.patch
        20 kB
        Simon Willnauer
      4. LUCENE-2723-BulkEnumWrapper.patch
        13 kB
        Simon Willnauer
      5. LUCENE-2723.patch
        396 kB
        Michael McCandless
      6. LUCENE-2723.patch
        401 kB
        Michael McCandless
      7. LUCENE-2723.patch
        170 kB
        Simon Willnauer
      8. LUCENE-2723.patch
        153 kB
        Robert Muir
      9. LUCENE-2723.patch
        153 kB
        Simon Willnauer
      10. LUCENE-2723.patch
        4 kB
        Simon Willnauer
      11. LUCENE-2723_wastedint.patch
        23 kB
        Robert Muir
      12. LUCENE-2723_termscorer.patch
        16 kB
        Robert Muir
      13. LUCENE-2723_openEnum.patch
        0.8 kB
        Yonik Seeley
      14. LUCENE-2723_facetPerSeg.patch
        8 kB
        Yonik Seeley
      15. LUCENE-2723_facetPerSeg.patch
        33 kB
        Yonik Seeley
      16. LUCENE-2723_bulkvint.patch
        10 kB
        Robert Muir

        Issue Links

          Activity

          No work has yet been logged on this issue.

            People

            • Assignee:
              Michael McCandless
              Reporter:
              Michael McCandless
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development