Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-2723

Speed up Lucene's low level bulk postings read API

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Won't Fix
    • Affects Version/s: None
    • Fix Version/s: 4.1
    • Component/s: core/index
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Spinoff from LUCENE-1410.

      The flex DocsEnum has a simple bulk-read API that reads the next chunk
      of docs/freqs. But it's a poor fit for intblock codecs like FOR/PFOR
      (from LUCENE-1410). This is not unlike sucking coffee through those
      tiny plastic coffee stirrers they hand out airplanes that,
      surprisingly, also happen to function as a straw.

      As a result we see no perf gain from using FOR/PFOR.

      I had hacked up a fix for this, described at in my blog post at
      http://chbits.blogspot.com/2010/08/lucene-performance-with-pfordelta-codec.html

      I'm opening this issue to get that work to a committable point.

      So... I've worked out a new bulk-read API to address performance
      bottleneck. It has some big changes over the current bulk-read API:

      • You can now also bulk-read positions (but not payloads), but, I
        have yet to cutover positional queries.
      • The buffer contains doc deltas, not absolute values, for docIDs
        and positions (freqs are absolute).
      • Deleted docs are not filtered out.
      • The doc & freq buffers need not be "aligned". For fixed intblock
        codecs (FOR/PFOR) they will be, but for varint codecs (Simple9/16,
        Group varint, etc.) they won't be.

      It's still a work in progress...

        Attachments

        1. LUCENE-2723-BulkEnumWrapper.patch
          13 kB
          Simon Willnauer
        2. LUCENE-2723.patch
          4 kB
          Simon Willnauer
        3. LUCENE-2723_bulkvint.patch
          10 kB
          Robert Muir
        4. LUCENE-2723_facetPerSeg.patch
          33 kB
          Yonik Seeley
        5. LUCENE-2723_facetPerSeg.patch
          8 kB
          Yonik Seeley
        6. LUCENE-2723_openEnum.patch
          0.8 kB
          Yonik Seeley
        7. LUCENE-2723_wastedint.patch
          23 kB
          Robert Muir
        8. LUCENE-2723-termscorer.patch
          20 kB
          Simon Willnauer
        9. LUCENE-2723-termscorer.patch
          20 kB
          Simon Willnauer
        10. LUCENE-2723-termscorer.patch
          21 kB
          Simon Willnauer
        11. LUCENE-2723_termscorer.patch
          16 kB
          Robert Muir
        12. LUCENE-2723.patch
          153 kB
          Simon Willnauer
        13. LUCENE-2723.patch
          153 kB
          Robert Muir
        14. LUCENE-2723.patch
          170 kB
          Simon Willnauer
        15. LUCENE-2723.patch
          401 kB
          Michael McCandless
        16. LUCENE-2723.patch
          396 kB
          Michael McCandless

          Issue Links

            Activity

              People

              • Assignee:
                mikemccand Michael McCandless
                Reporter:
                mikemccand Michael McCandless
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: