Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-10993

Bring bloomfilter as a public API

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Major
    • Resolution: Duplicate
    • None
    • None
    • API / DataStream
    • None

    Description

      Flink internally provides an implementation of BloomFilter, but only for internal optimization, and does not provide APIs for public access.

      Here is a user mail discussion before : http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Bloom-filter-in-Flink-td10608.html

      Considering that many users have the need to "determine duplicates" in streaming computing, I think it would make sense to provide such an API.

      In addition, Spark has provided BloomFilter as a public API : 

      val bf = df.stat.bloomFilter("dd",dataLen,0.01)
      val rightNum = rdd.map(x=>(x.toInt,bf.mightContainString(x)))
      

       

       

      Attachments

        Issue Links

          Activity

            People

              yanghua vinoyang
              yanghua vinoyang
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: