Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-6552

Drill Metadata management "Drill Metastore"

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.13.0
    • 1.18.0
    • Metadata

    Description

      It would be useful for Drill to have some sort of metastore which would enable Drill to remember previously defined schemata so Drill doesn’t have to do the same work over and over again.

      It allows to store schema and statistics, which will allow to accelerate queries validation, planning and execution time. Also it increases stability of Drill and allows to avoid different kind if issues: "schema change Exceptions", "limit 0" optimization and so on. 

      One of the main candidates is Hive Metastore.
      Starting from 3.0 version Hive Metastore can be the separate service from Hive server:
      https://cwiki.apache.org/confluence/display/Hive/AdminManual+Metastore+3.0+Administration

      Optional enhancement is storing Drill's profiles, UDFs, plugins configs in some kind of metastore as well.

      Attachments

        Issue Links

        1.
        Research and investigate a way for collecting and storing table statistics in the scope of metastore integration Sub-task Resolved Vova Vysotskyi Actions
        2.
        Adapt current Parquet Metadata cache implementation to use Drill Metastore API Sub-task Resolved Vova Vysotskyi Actions
        3.
        Implement caching of BaseMetadata classes Sub-task Resolved Vova Vysotskyi Actions
        4.
        File Metadata Metastore Plugin Sub-task Resolved Vitalii Diravka Actions
        5.
        Adapt statistics to use Drill Metastore API Sub-task Resolved Vova Vysotskyi Actions
        6.
        Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore Sub-task Resolved Vova Vysotskyi Actions
        7.
        Move schema-related classes from exec module to be able to use them in metastore module Sub-task Resolved Vova Vysotskyi Actions
        8.
        Implement Drill Iceberg Metastore plugin Sub-task Resolved Arina Ielchiieva Actions
        9.
        Support Iceberg metadata expiration Sub-task Resolved Arina Ielchiieva Actions
        10.
        Add vararg UDFs support Sub-task Resolved Vova Vysotskyi Actions
        11.
        Introduce session options for the Drill Metastore Sub-task Resolved Vova Vysotskyi Actions
        12.
        Create operator for handling metadata Sub-task Resolved Vova Vysotskyi Actions
        13.
        Implement metadata usage for Parquet format plugin Sub-task Resolved Vova Vysotskyi Actions
        14.
        Expose Drill Metastore data through INFORMATION_SCHEMA Sub-task Closed Arina Ielchiieva Actions
        15.
        Implement metadata usage for text format plugin Sub-task Resolved Vova Vysotskyi Actions
        16.
        Introduce ANALYZE TABLE statements Sub-task Resolved Vova Vysotskyi Actions
        17.
        Allow passing table function parameters into ANALYZE statement Sub-task Resolved Vova Vysotskyi Actions

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            volodymyr Vova Vysotskyi
            vitalii Vitalii Diravka
            Votes:
            1 Vote for this issue
            Watchers:
            16 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment