Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-2240

Use BloomFilter to minimize probe side records which are spilled to disk in Hybrid-Hash-Join

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Minor
    • Resolution: Implemented
    • Affects Version/s: None
    • Fix Version/s: 0.10.0
    • Component/s: None
    • Labels:
      None

      Description

      In Hybrid-Hash-Join, while small table does not fit into memory, part of the small table data would be spilled to disk, and the counterpart partition of big table data would be spilled to disk in probe phase as well. If we build a BloomFilter while spill small table to disk during build phase, and use it to filter the big table records which tend to be spilled to disk, this may greatly reduce the spilled big table file size, and saved the disk IO cost for writing and further reading.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                chengxiang li Chengxiang Li
                Reporter:
                chengxiang li Chengxiang Li
              • Votes:
                0 Vote for this issue
                Watchers:
                7 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: