Uploaded image for project: 'Parquet'
  1. Parquet
  2. PARQUET-2237

Improve performance when filters in RowGroupFilter can match exactly

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: In Progress
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      If we can accurately judge by the minMax status, we don’t need to load the dictionary from filesystem and compare one by one anymore.

      Similarly , Bloomfilter needs to load from filesystem, it may costs time and memory. If we can exactly determine the existence/nonexistence of the value from minMax or dictionary filters , then we can avoid using Bloomfilter to Improve performance.

      For example,

      1. read data greater than x1 in the block, if minMax in status is all greater than x1, then we don't need to read dictionary and compare one by one.
      2. If we already have page dictionaries and have compared one by one, we don't need to read BloomFilter and compare.

      Attachments

        Activity

          People

            miracle Mars
            miracle Mars
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: