Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-20416

Need a cached catalog for HiveCatalog

    XMLWordPrintableJSON

Details

    Description

      For OLAP scenarios, There are usually some analytical queries which running time is relatively short. These queries are also sensitive to latency. In the current Blink sql processing, parse/validate/optimize stages are all need meta data from catalog API. But each request to the catalog requires re-run of the underlying meta query. 

       

      We may need a cached catalog which can cache the table schema and statistic info to avoid unnecessary repeated meta requests. 

      Design doc:https://docs.google.com/document/d/1oL8HUpv2WaF6OkFvbH5iefXkOJB__Dal_bYsIZJA_Gk/edit?usp=sharing

      I have submitted a related PR for adding a genetic cached catalog, which can delegate other implementations of AbstractCatalog. 

      https://github.com/apache/flink/pull/14260

      Attachments

        1. hms cache.jpg
          83 kB
          Sebastian Liu
        2. hms cache.jpg
          49 kB
          Sebastian Liu

        Issue Links

          Activity

            People

              Unassigned Unassigned
              shared_ptr Sebastian Liu
              Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated: