Description
The parquet metadata cache does not have correct min/max values for varchar and interval data types.
I have attached a parquet file. Here is what parquet tools shows for varchar:
[varchar_col] BINARY 14.6% of all space [PLAIN, BIT_PACKED] min: 67 max: 67 average: 67 total: 67 (raw data: 65 saving -3%)
values: min: 1 max: 1 average: 1 total: 1
uncompressed: min: 65 max: 65 average: 65 total: 65
column values statistics: min: ioegjNJKvnkd, max: ioegjNJKvnkd, num_nulls: 0
Here is what the metadata cache file shows:
"name" : [ "varchar_col" ],
"minValue" : "aW9lZ2pOSkt2bmtk",
"maxValue" : "aW9lZ2pOSkt2bmtk",
"nulls" : 0
Here is what parquet tools shows for interval:
[interval_col] BINARY 11.3% of all space [PLAIN, BIT_PACKED] min: 52 max: 52 average: 52 total: 52 (raw data: 50 saving -4%)
values: min: 1 max: 1 average: 1 total: 1
uncompressed: min: 50 max: 50 average: 50 total: 50
column values statistics: min: P18582D, max: P18582D, num_nulls: 0
Here is what the metadata cache file shows:
"name" : [ "interval_col" ],
"minValue" : "UDE4NTgyRA==",
"maxValue" : "UDE4NTgyRA==",
"nulls" : 0