Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-4053

Reduce metadata cache file size

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.3.0
    • 1.4.0
    • Metadata
    • None

    Description

      The parquet metadata cache file has fair amount of redundant metadata that causes the size of the cache file to bloat. Two things that we can reduce are :
      1) Schema is repeated for every row group. We can keep a merged schema (similar to what was discussed for insert into functionality) 2) The max and min value in the stats are used for partition pruning when the values are the same. We can keep the maxValue only and that too only if it is the same as the minValue.

      Attachments

        Activity

          People

            parthc Parth Chandra
            parthc Parth Chandra
            Rahul Kumar Challapalli Rahul Kumar Challapalli
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: