Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-6552 Drill Metadata management "Drill Metastore"
  3. DRILL-6852

Adapt current Parquet Metadata cache implementation to use Drill Metastore API

Attach filesAttach ScreenshotVotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.16.0
    • None

    Description

      According to the design document for DRILL-6552, existing metadata cache API should be adapted to use generalized API for metastore and parquet metadata cache will be presented as the implementation of metastore API.

      The aim of this Jira is to refactor Parquet Metadata cache implementation and adapt it to use Drill Metastore API.

      Execution plan:

      • Refactor AbstractParquetGroupScan and its implementations to use metastore metadata classes. Store Drill data types in metadata files for Parquet tables.
      • Storing the least restrictive type instead of current first file’s column data type.
      • Rework logic in AbstractParquetGroupScan to allow filtering at different metadata layers: partition, file, row group, etc. The same for pushing the limit.
      • Implement logic to convert existing parquet metadata to metastore metadata to preserve backward compatibility.
      • Implement fetching metadata only when it is needed (for filtering, limit, count etc.)

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            volodymyr Vova Vysotskyi
            volodymyr Vova Vysotskyi
            Aman Sinha Aman Sinha
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment