Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-9470

Use Parquet bloom filters

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Implemented
    • None
    • None
    • Backend
    • ghx-label-6

    Description

      PARQUET-41 has been closed recently. This means Parquet-MR is capable of writing and reading bloom filters.

      Currently bloom filters are per column chunk entries, i.e. with their help we can filter out entire row groups.

      We already filter row groups in HdfsParquetScanner::NextRowGroup() based on column chunk statistics and dictionaries. Skipping row groups based on bloom filters could be also added to this funciton.

      Impala could also write bloom filters.

      Attachments

        Issue Links

          Activity

            People

              daniel.becker Daniel Becker
              boroknagyz Zoltán Borók-Nagy
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: