Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-10993

Bring bloomfilter as a public API

    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: Closed
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: API / DataStream
    • Labels:
      None

      Description

      Flink internally provides an implementation of BloomFilter, but only for internal optimization, and does not provide APIs for public access.

      Here is a user mail discussion before : http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Bloom-filter-in-Flink-td10608.html

      Considering that many users have the need to "determine duplicates" in streaming computing, I think it would make sense to provide such an API.

      In addition, Spark has provided BloomFilter as a public API : 

      val bf = df.stat.bloomFilter("dd",dataLen,0.01)
      val rightNum = rdd.map(x=>(x.toInt,bf.mightContainString(x)))
      

       

       

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                yanghua vinoyang
                Reporter:
                yanghua vinoyang
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: