Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-9470

Use Parquet bloom filters

    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    • Epic Color:
      ghx-label-6

      Description

      PARQUET-41 has been closed recently. This means Parquet-MR is capable of writing and reading bloom filters.

      Currently bloom filters are per column chunk entries, i.e. with their help we can filter out entire row groups.

      We already filter row groups in HdfsParquetScanner::NextRowGroup() based on column chunk statistics and dictionaries. Skipping row groups based on bloom filters could be also added to this funciton.

      Impala could also write bloom filters.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              boroknagyz Zoltán Borók-Nagy
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated: