Details
-
Improvement
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
None
-
None
Description
The DetectDuplicate processor makes use of a distributed map cache for maintaining a list of unique file identifiers (such as hashes).
The distributed map cache functionality could be provided by an HBase table, which then allows for reliably storing a huge volume of file identifiers and auditing information. The downside of this approach is of course that HBase is required.
Storing the unique file identifiers in a reliable, query-able manner along with some audit information is of benefit to several use cases.
Attachments
Issue Links
- links to