Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-20416

Need a cached catalog for HiveCatalog

    XMLWordPrintableJSON

    Details

      Description

      For OLAP scenarios, There are usually some analytical queries which running time is relatively short. These queries are also sensitive to latency. In the current Blink sql processing, parse/validate/optimize stages are all need meta data from catalog API. But each request to the catalog requires re-run of the underlying meta query. 

       

      We may need a cached catalog which can cache the table schema and statistic info to avoid unnecessary repeated meta requests. 

      Design doc:https://docs.google.com/document/d/1oL8HUpv2WaF6OkFvbH5iefXkOJB__Dal_bYsIZJA_Gk/edit?usp=sharing

      I have submitted a related PR for adding a genetic cached catalog, which can delegate other implementations of AbstractCatalog. 

      https://github.com/apache/flink/pull/14260

        Attachments

        1. hms cache.jpg
          83 kB
          Sebastian Liu
        2. hms cache.jpg
          49 kB
          Sebastian Liu

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                shared_ptr Sebastian Liu
              • Votes:
                0 Vote for this issue
                Watchers:
                8 Start watching this issue

                Dates

                • Created:
                  Updated: