Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-2723

Speed up Lucene's low level bulk postings read API

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Won't Fix
    • None
    • 4.1
    • core/index
    • None
    • New

    Description

      Spinoff from LUCENE-1410.

      The flex DocsEnum has a simple bulk-read API that reads the next chunk
      of docs/freqs. But it's a poor fit for intblock codecs like FOR/PFOR
      (from LUCENE-1410). This is not unlike sucking coffee through those
      tiny plastic coffee stirrers they hand out airplanes that,
      surprisingly, also happen to function as a straw.

      As a result we see no perf gain from using FOR/PFOR.

      I had hacked up a fix for this, described at in my blog post at
      http://chbits.blogspot.com/2010/08/lucene-performance-with-pfordelta-codec.html

      I'm opening this issue to get that work to a committable point.

      So... I've worked out a new bulk-read API to address performance
      bottleneck. It has some big changes over the current bulk-read API:

      • You can now also bulk-read positions (but not payloads), but, I
        have yet to cutover positional queries.
      • The buffer contains doc deltas, not absolute values, for docIDs
        and positions (freqs are absolute).
      • Deleted docs are not filtered out.
      • The doc & freq buffers need not be "aligned". For fixed intblock
        codecs (FOR/PFOR) they will be, but for varint codecs (Simple9/16,
        Group varint, etc.) they won't be.

      It's still a work in progress...

      Attachments

        1. LUCENE-2723.patch
          396 kB
          Michael McCandless
        2. LUCENE-2723.patch
          401 kB
          Michael McCandless
        3. LUCENE-2723.patch
          170 kB
          Simon Willnauer
        4. LUCENE-2723.patch
          153 kB
          Robert Muir
        5. LUCENE-2723.patch
          153 kB
          Simon Willnauer
        6. LUCENE-2723_termscorer.patch
          16 kB
          Robert Muir
        7. LUCENE-2723-termscorer.patch
          21 kB
          Simon Willnauer
        8. LUCENE-2723-termscorer.patch
          20 kB
          Simon Willnauer
        9. LUCENE-2723-termscorer.patch
          20 kB
          Simon Willnauer
        10. LUCENE-2723_wastedint.patch
          23 kB
          Robert Muir
        11. LUCENE-2723_openEnum.patch
          0.8 kB
          Yonik Seeley
        12. LUCENE-2723_facetPerSeg.patch
          8 kB
          Yonik Seeley
        13. LUCENE-2723_facetPerSeg.patch
          33 kB
          Yonik Seeley
        14. LUCENE-2723_bulkvint.patch
          10 kB
          Robert Muir
        15. LUCENE-2723.patch
          4 kB
          Simon Willnauer
        16. LUCENE-2723-BulkEnumWrapper.patch
          13 kB
          Simon Willnauer

        Issue Links

          Activity

            People

              mikemccand Michael McCandless
              mikemccand Michael McCandless
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Slack

                  Issue deployment