[DRILL-4053] Reduce metadata cache file size - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 1.3.0
Fix Version/s: 1.4.0
Component/s: Metadata
Labels:
None

Description

The parquet metadata cache file has fair amount of redundant metadata that causes the size of the cache file to bloat. Two things that we can reduce are :
1) Schema is repeated for every row group. We can keep a merged schema (similar to what was discussed for insert into functionality) 2) The max and min value in the stats are used for partition pruning when the values are the same. We can keep the maxValue only and that too only if it is the same as the minValue.

Attachments

Activity

People

Assignee:: Parth Chandra

Reporter:: Parth Chandra

Reviewer:: Rahul Kumar Challapalli

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 09/Nov/15 18:54

Updated:: 15/Dec/15 20:24

Resolved:: 02/Dec/15 17:08