[OAK-7090] Use Bloom filters for composite data store blob ID lookup table - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Technical task
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: blob, blob-cloud, blob-cloud-azure, blob-plugins
Labels:
None

Description

The composite data store attempts to keep a mapping of blob ids to delegates where that blob id should be found. We should use Bloom filters to make this mapping more efficient.

There are a couple of challenges with implementing Bloom filters for this purpose.

Determining the appropriate size of the Bloom filter. Assuming OAK-7089 is completed before this one, we should have a reasonable guess as to the number of blob IDs at startup time, but this may change over time. This may require a task to rebuild the table for a more appropriate size once the table becomes too full (too many false positives).
Handling deletions. Once a record has been deleted, the corresponding blob ID may also need to be removed (similar algorithm to data store GC). Bloom filters don't typically handle deletions though. This may require something like e.g. Invertible Bloom Filter, or this may be as simple as using data store GC time to rebuild the Bloom filter appropriately.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Matt Ryan

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 19/Dec/17 18:58

Updated:: 19/Dec/17 18:58