Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-4069

Segment-level Bloom filters

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • 3.6, 4.0-ALPHA
    • 4.0-BETA, 6.0
    • core/index
    • None
    • New, Patch Available

    Description

      An addition to each segment which stores a Bloom filter for selected fields in order to give fast-fail to term searches, helping avoid wasted disk access.

      Best suited for low-frequency fields e.g. primary keys on big indexes with many segments but also speeds up general searching in my tests.

      Overview slideshow here: http://www.slideshare.net/MarkHarwood/lucene-bloomfilteredsegments

      Benchmarks based on Wikipedia content here: http://goo.gl/X7QqU

      Patch based on 3.6 codebase attached.
      There are no 3.6 API changes currently - to play just add a field with "_blm" on the end of the name to invoke special indexing/querying capability. Clearly a new Field or schema declaration would need adding to APIs to configure the service properly.

      Also, a patch for Lucene4.0 codebase introducing a new PostingsFormat

      Attachments

        1. PrimaryKeyPerfTest40.java
          10 kB
          Mark Harwood
        2. PKLookupUpdatePerfTest.java
          9 kB
          Mark Harwood
        3. PKLookupUpdatePerfTest.java
          19 kB
          Michael McCandless
        4. PKLookupUpdatePerfTest.java
          19 kB
          Mark Harwood
        5. PKLookupUpdatePerfTest.java
          19 kB
          Michael McCandless
        6. MHBloomFilterOn3.6Branch.patch
          17 kB
          Mark Harwood
        7. LUCENE-4203.patch
          3 kB
          Michael McCandless
        8. LUCENE-4069-tryDeleteDocument.patch
          3 kB
          Michael McCandless
        9. BloomFilterPostingsBranch4x.patch
          54 kB
          Mark Harwood
        10. 4069Failure.zip
          10 kB
          Mark Harwood

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            mharwood Mark Harwood
            mharwood Mark Harwood
            Votes:
            4 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment