Details

    • Type: Improvement
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 3.6, 4.0-ALPHA
    • Fix Version/s: 4.0-BETA, 6.0
    • Component/s: core/index
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      An addition to each segment which stores a Bloom filter for selected fields in order to give fast-fail to term searches, helping avoid wasted disk access.

      Best suited for low-frequency fields e.g. primary keys on big indexes with many segments but also speeds up general searching in my tests.

      Overview slideshow here: http://www.slideshare.net/MarkHarwood/lucene-bloomfilteredsegments

      Benchmarks based on Wikipedia content here: http://goo.gl/X7QqU

      Patch based on 3.6 codebase attached.
      There are no 3.6 API changes currently - to play just add a field with "_blm" on the end of the name to invoke special indexing/querying capability. Clearly a new Field or schema declaration would need adding to APIs to configure the service properly.

      Also, a patch for Lucene4.0 codebase introducing a new PostingsFormat

        Attachments

        1. BloomFilterPostingsBranch4x.patch
          54 kB
          Mark Harwood
        2. 4069Failure.zip
          10 kB
          Mark Harwood
        3. LUCENE-4203.patch
          3 kB
          Michael McCandless
        4. PKLookupUpdatePerfTest.java
          19 kB
          Michael McCandless
        5. PKLookupUpdatePerfTest.java
          19 kB
          Mark Harwood
        6. LUCENE-4069-tryDeleteDocument.patch
          3 kB
          Michael McCandless
        7. PKLookupUpdatePerfTest.java
          19 kB
          Michael McCandless
        8. PKLookupUpdatePerfTest.java
          9 kB
          Mark Harwood
        9. PrimaryKeyPerfTest40.java
          10 kB
          Mark Harwood
        10. MHBloomFilterOn3.6Branch.patch
          17 kB
          Mark Harwood

          Issue Links

            Activity

              People

              • Assignee:
                markh Mark Harwood
                Reporter:
                markh Mark Harwood
              • Votes:
                4 Vote for this issue
                Watchers:
                8 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: