[DRILL-6552] Drill Metadata management "Drill Metastore" - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 1.13.0
Fix Version/s: 1.18.0
Component/s: Metadata
Labels:
- doc-impacting

Description

It would be useful for Drill to have some sort of metastore which would enable Drill to remember previously defined schemata so Drill doesn’t have to do the same work over and over again.

It allows to store schema and statistics, which will allow to accelerate queries validation, planning and execution time. Also it increases stability of Drill and allows to avoid different kind if issues: "schema change Exceptions", "limit 0" optimization and so on.

One of the main candidates is Hive Metastore.
Starting from 3.0 version Hive Metastore can be the separate service from Hive server:
https://cwiki.apache.org/confluence/display/Hive/AdminManual+Metastore+3.0+Administration

Optional enhancement is storing Drill's profiles, UDFs, plugins configs in some kind of metastore as well.

Attachments

Issue Links

incorporates

DRILL-6852 Adapt current Parquet Metadata cache implementation to use Drill Metastore API

Resolved

DRILL-7098 File Metadata Metastore Plugin

Resolved

is depended upon by

DRILL-5192 REFRESH table METADATA as default system option

Resolved

is related to

DRILL-7028 Reduce the planning time of queries on large Parquet tables with large metadata cache files

Resolved

DRILL-7430 Drill Metastore analyze improvements

Open

DRILL-7684 Implement Drill Metastore metadata usage for Hive storage plugin

Open

relates to

DRILL-3588 Write back to Hive Metastore

Open

DRILL-6035 Specify Drill's JSON behavior

Open

DRILL-1328 Support table statistics

Resolved

requires

DRILL-5603 Replace String file paths to Hadoop Path

Resolved

DRILL-6604 Upgrade Drill Hive client to Hive3.1 version

Resolved

links to

Design document

Hangout Presentation

Metastore metadata collecting and retrieval

(1 is related to, 3 relates to, 2 requires, 3 links to)

Sub-Tasks

1.	Research and investigate a way for collecting and storing table statistics in the scope of metastore integration	Resolved	Vova Vysotskyi
2.	Adapt current Parquet Metadata cache implementation to use Drill Metastore API	Resolved	Vova Vysotskyi
3.	Implement caching of BaseMetadata classes	Resolved	Vova Vysotskyi
4.	File Metadata Metastore Plugin	Resolved	Vitalii Diravka
5.	Adapt statistics to use Drill Metastore API	Resolved	Vova Vysotskyi
6.	Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore	Resolved	Vova Vysotskyi
7.	Move schema-related classes from exec module to be able to use them in metastore module	Resolved	Vova Vysotskyi
8.	Implement Drill Iceberg Metastore plugin	Resolved	Arina Ielchiieva
9.	Support Iceberg metadata expiration	Resolved	Arina Ielchiieva
10.	Add vararg UDFs support	Resolved	Vova Vysotskyi
11.	Introduce session options for the Drill Metastore	Resolved	Vova Vysotskyi
12.	Create operator for handling metadata	Resolved	Vova Vysotskyi
13.	Implement metadata usage for Parquet format plugin	Resolved	Vova Vysotskyi
14.	Expose Drill Metastore data through INFORMATION_SCHEMA	Closed	Arina Ielchiieva
15.	Implement metadata usage for text format plugin	Resolved	Vova Vysotskyi
16.	Introduce ANALYZE TABLE statements	Resolved	Vova Vysotskyi
17.	Allow passing table function parameters into ANALYZE statement	Resolved	Vova Vysotskyi

Activity

People

Assignee:: Vova Vysotskyi

Reporter:: Vitalii Diravka

Votes:: 1 Vote for this issue

Watchers:: 15 Start watching this issue

Dates

Created:: 28/Jun/18 16:46

Updated:: 01/Apr/20 17:43

Resolved:: 20/Mar/20 10:44