Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-9470

Use Parquet bloom filters

    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: In Progress
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Backend
    • Labels:
    • Epic Color:
      ghx-label-6

      Description

      PARQUET-41 has been closed recently. This means Parquet-MR is capable of writing and reading bloom filters.

      Currently bloom filters are per column chunk entries, i.e. with their help we can filter out entire row groups.

      We already filter row groups in HdfsParquetScanner::NextRowGroup() based on column chunk statistics and dictionaries. Skipping row groups based on bloom filters could be also added to this funciton.

      Impala could also write bloom filters.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                daniel.becker Daniel Becker
                Reporter:
                boroknagyz Zoltán Borók-Nagy
              • Votes:
                0 Vote for this issue
                Watchers:
                7 Start watching this issue

                Dates

                • Created:
                  Updated: