Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-6552 Drill Metadata management "Drill Metastore"
  3. DRILL-6852

Adapt current Parquet Metadata cache implementation to use Drill Metastore API

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.16.0
    • None

    Description

      According to the design document for DRILL-6552, existing metadata cache API should be adapted to use generalized API for metastore and parquet metadata cache will be presented as the implementation of metastore API.

      The aim of this Jira is to refactor Parquet Metadata cache implementation and adapt it to use Drill Metastore API.

      Execution plan:

      • Refactor AbstractParquetGroupScan and its implementations to use metastore metadata classes. Store Drill data types in metadata files for Parquet tables.
      • Storing the least restrictive type instead of current first file’s column data type.
      • Rework logic in AbstractParquetGroupScan to allow filtering at different metadata layers: partition, file, row group, etc. The same for pushing the limit.
      • Implement logic to convert existing parquet metadata to metastore metadata to preserve backward compatibility.
      • Implement fetching metadata only when it is needed (for filtering, limit, count etc.)

      Attachments

        Issue Links

          Activity

            People

              volodymyr Vova Vysotskyi
              volodymyr Vova Vysotskyi
              Aman Sinha Aman Sinha
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: