Uploaded image for project: 'Apache NiFi'
  1. Apache NiFi
  2. NIFI-3644

Add DetectDuplicateUsingHBase processor

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • None
    • 1.3.0
    • Extensions
    • None

    Description

      The DetectDuplicate processor makes use of a distributed map cache for maintaining a list of unique file identifiers (such as hashes).

      The distributed map cache functionality could be provided by an HBase table, which then allows for reliably storing a huge volume of file identifiers and auditing information. The downside of this approach is of course that HBase is required.

      Storing the unique file identifiers in a reliable, query-able manner along with some audit information is of benefit to several use cases.

      Attachments

        Issue Links

          Activity

            People

              bbende Bryan Bende
              bjorn.olsen1@gmail.com Bjorn Olsen
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: