According to the design document for
DRILL-6552, existing metadata cache API should be adapted to use generalized API for metastore and parquet metadata cache will be presented as the implementation of metastore API.
The aim of this Jira is to refactor Parquet Metadata cache implementation and adapt it to use Drill Metastore API.
- Refactor AbstractParquetGroupScan and its implementations to use metastore metadata classes. Store Drill data types in metadata files for Parquet tables.
- Storing the least restrictive type instead of current first file’s column data type.
- Rework logic in AbstractParquetGroupScan to allow filtering at different metadata layers: partition, file, row group, etc. The same for pushing the limit.
- Implement logic to convert existing parquet metadata to metastore metadata to preserve backward compatibility.
- Implement fetching metadata only when it is needed (for filtering, limit, count etc.)