Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-4069

Segment-level Bloom filters

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • 3.6, 4.0-ALPHA
    • 4.0-BETA, 6.0
    • core/index
    • None
    • New, Patch Available

    Description

      An addition to each segment which stores a Bloom filter for selected fields in order to give fast-fail to term searches, helping avoid wasted disk access.

      Best suited for low-frequency fields e.g. primary keys on big indexes with many segments but also speeds up general searching in my tests.

      Overview slideshow here: http://www.slideshare.net/MarkHarwood/lucene-bloomfilteredsegments

      Benchmarks based on Wikipedia content here: http://goo.gl/X7QqU

      Patch based on 3.6 codebase attached.
      There are no 3.6 API changes currently - to play just add a field with "_blm" on the end of the name to invoke special indexing/querying capability. Clearly a new Field or schema declaration would need adding to APIs to configure the service properly.

      Also, a patch for Lucene4.0 codebase introducing a new PostingsFormat

      Attachments

        1. 4069Failure.zip
          10 kB
          Mark Harwood
        2. BloomFilterPostingsBranch4x.patch
          54 kB
          Mark Harwood
        3. LUCENE-4069-tryDeleteDocument.patch
          3 kB
          Michael McCandless
        4. LUCENE-4203.patch
          3 kB
          Michael McCandless
        5. MHBloomFilterOn3.6Branch.patch
          17 kB
          Mark Harwood
        6. PKLookupUpdatePerfTest.java
          19 kB
          Michael McCandless
        7. PKLookupUpdatePerfTest.java
          19 kB
          Mark Harwood
        8. PKLookupUpdatePerfTest.java
          19 kB
          Michael McCandless
        9. PKLookupUpdatePerfTest.java
          9 kB
          Mark Harwood
        10. PrimaryKeyPerfTest40.java
          10 kB
          Mark Harwood

        Issue Links

          Activity

            People

              mharwood Mark Harwood
              mharwood Mark Harwood
              Votes:
              4 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: