Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-6552 Drill Metadata management "Drill Metastore"
  3. DRILL-6852

Adapt current Parquet Metadata cache implementation to use Drill Metastore API

    XMLWordPrintableJSON

    Details

    • Type: Sub-task
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.16.0
    • Component/s: None
    • Labels:

      Description

      According to the design document for DRILL-6552, existing metadata cache API should be adapted to use generalized API for metastore and parquet metadata cache will be presented as the implementation of metastore API.

      The aim of this Jira is to refactor Parquet Metadata cache implementation and adapt it to use Drill Metastore API.

      Execution plan:

      • Refactor AbstractParquetGroupScan and its implementations to use metastore metadata classes. Store Drill data types in metadata files for Parquet tables.
      • Storing the least restrictive type instead of current first file’s column data type.
      • Rework logic in AbstractParquetGroupScan to allow filtering at different metadata layers: partition, file, row group, etc. The same for pushing the limit.
      • Implement logic to convert existing parquet metadata to metastore metadata to preserve backward compatibility.
      • Implement fetching metadata only when it is needed (for filtering, limit, count etc.)

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                volodymyr Vova Vysotskyi
                Reporter:
                volodymyr Vova Vysotskyi
                Reviewer:
                Aman Sinha
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: