[FLINK-20416] Need a cached catalog for HiveCatalog - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Not a Priority
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: Connectors / Common, Connectors / Hive, Table SQL / Ecosystem
Labels:

Description

For OLAP scenarios, There are usually some analytical queries which running time is relatively short. These queries are also sensitive to latency. In the current Blink sql processing, parse/validate/optimize stages are all need meta data from catalog API. But each request to the catalog requires re-run of the underlying meta query.

We may need a cached catalog which can cache the table schema and statistic info to avoid unnecessary repeated meta requests.

Design doc：https://docs.google.com/document/d/1oL8HUpv2WaF6OkFvbH5iefXkOJB__Dal_bYsIZJA_Gk/edit?usp=sharing

I have submitted a related PR for adding a genetic cached catalog, which can delegate other implementations of AbstractCatalog.

https://github.com/apache/flink/pull/14260