Uploaded image for project: 'Apache HAWQ'
  1. Apache HAWQ
  2. HAWQ-1597

Implement Runtime Filter for Hash Join

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.4.0.0
    • Query Execution
    • None

    Description

      Bloom filter is a space-efficient probabilistic data structure invented in 1970, which is used to test whether an element is a member of a set.
      Nowdays, bloom filter is widely used in OLAP or data-intensive applications to quickly filter data. It is usually implemented in OLAP systems for hash join. The basic idea is, when hash join two tables, during the build phase, build a bloomfilter information for the inner table, then push down this bloomfilter information to the scan of the outer table, so that, less tuples from the outer table will be returned to hash join node and joined with hash table. It can greatly improment the hash join performance if the selectivity is high.

      Attachments

        1. image-2018-09-14-17-54-06-150.png
          89 kB
          Kuien Liu
        2. q17_modified_hawq.gif
          42 kB
          Wen Lin
        3. 111BA854-7318-46A7-8338-5F2993D60FA3.png
          89 kB
          Wen Lin
        4. HAWQ Runtime Filter Design.pdf
          409 kB
          Wen Lin
        5. HAWQ Runtime Filter Design.pdf
          357 kB
          Wen Lin

        Activity

          People

            wlin Wen Lin
            wlin Wen Lin
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: