Uploaded image for project: 'Kylin'
  1. Kylin
  2. KYLIN-5948

Support internal table

VotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 5.0.0
    • 5.0.0
    • Job Engine, Query Engine
    • None

    Description

      01 Background
      To enhance the performance of detail data and ad-hoc queries, the internal table feature is designed.
      The internal table manage user's data directly in inner storage, where Kylin actively controls the data storage format and data organization to specifically improve query performance.

      02 Dev Design
      What needs to be done is as follows:
      1. Define the internal table metadata

      protected String project;
      protected final DatabaseDesc database;
      protected String identity;
      protected String name;
      
      private Map<String, String> tblProperties;
      private StorageType storageType;
      private String location;
      
      public enum StorageType {
          parquet, //only for dev/UT
          gluten, //clickhouse mergetree (default)
          deltalake, //future
          iceberg  //future
      } 

      2. implement internal table catalog

      We implement a kylin internal table catalog which extends TableCatalog of spark and we only need to implement the  loadTable function. The KyinternalCatalog get table metadata from kylin metadb and expose them as a ClickhouseTableV2(from apache gluten) to spark. 

       

      3. implement create table, update table, delete table, truncate table functions etc.

      Table management operations are implemented in web UI/open api(TODO), don't support DDL statement yet.

       
      4. implement load data into internal table function

      Support full load and incremental load.

      5. Support partition, bucket feature
      6. Support config table properties such as primaryKey, orderByKey
      7. Support gluten-mergetree as default storage type

      query process with internal table: 

       

       

      03 Roadmap TODOs
      1. Support cache pre-loading
      2. Support more partition type

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            lucao Cao, Lionel
            loneylee Shuai Li
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment