Uploaded image for project: 'Commons Collections'
  1. Commons Collections
  2. COLLECTIONS-801

Simplify Bloom filter implementation

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 4.5.0-M1
    • 4.5.0-M1
    • None
    • None

    Description

      The initial Bloom filter implementation has a number of issues arising from attempting to verify that filters were built the same way (ie. same number of bits, number of functions, hashing algorithm, etc.)  The net result is that there is significant overhead in the processing of the filters.  For a structure that is intended to speed up decisions this is a fault.  There are also issues with implementation hiding in the current code.

      The issue calls for:

      • The removal of the hashing tracking, all the registration and verification that the hashes are the same.  This becomes an operational issue for the developer using the library.  In most cases this is a trivial problem.
      • Removal of all methods that assume an implementation detail of the Bloom filter.  Rather than getting a list of indices or a list of bitmap longs (or bytes) the classes will implement a "Producer" style that uses IntConsumer, LongConsumer, etc to receive the indices or bit maps.  This pattern to carry forward to specialized filters like counting Bloom filters that will accept a BitCountConsumer to receive the counts for each bit.
      • There are cases where Bloom filters need to be updated and cases where they do not.  The change should include merge() methods to produce new Bloom filters and mergeInPlace() methods to update the Bloom filters.
      • There are issues with some assumptions in the Hasher implementations wherein the static hasher does not function as described.  Part of the goal of this change is to simplify the mechanism to pass the internal state of one bloom filter to another without either knowing the implementation of that state in the other.  This will be done via the "Producer" interface described above.  Hashers will be able to create IndexProducers that will produce the indices for a Bloom filter shape.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            claude Claude Warren
            claude Claude Warren
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment