Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-10494

Making use of the min/max column stats to improve min/max filters

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • Backend
    • None
    • ghx-label-2

    Description

      HMS (hive metastore) API offers means to store the minimal and maximal value per column (https://hive.apache.org/javadocs/r3.0.0/api/org/apache/hadoop/hive/metastore/api/ColumnStatisticsData.html). For example, such stats for an integer column can be captured via a LongColumnStatsData object (https://hive.apache.org/javadocs/r3.0.0/api/org/apache/hadoop/hive/metastore/api/LongColumnStatsData.html).

      It is desirable to use the min and max stats per column to help the formation of useful min/max filters that can help reduce the data scanned for Parquet tables.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              sql_forever Qifan Chen
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated: