Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-1194

Reorganize HoodieHiveClient and make it fully support Hive Metastore API

    XMLWordPrintableJSON

Details

    • Task
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • 1.1.0, 0.15.0
    • None

    Description

      Currently there are three ways in HoodieHiveClient to perform Hive functionalities. One is through Hive JDBC, one is through Hive Metastore API. One is through Hive Driver.

      There’s a parameter called hoodie.datasource.hive_sync.use_jdbc to control whether use Hive JDBC or not. However, this parameter does not accurately describe the situation.

      Basically, current logic is when set use_jdbc to true, most of the methods in HoodieHiveClient will use JDBC, and few methods in HoodieHiveClient will use Hive Metastore API.
      When set use_jdbc to false, most of the methods in HoodieHiveClient will use Hive Driver, and few methods in HoodieHiveClient will use Hive Metastore API.

      Here is a table shows that what will actually be used when setting use_jdbc to ture/false.

      Method use_jdbc=true use_jdbc=false
      addPartitionsToTable JDBC Hive Driver
      updatePartitionsToTable JDBC Hive Driver
      scanTablePartitions Metastore API Metastore API
      updateTableDefinition JDBC Hive Driver
      createTable JDBC Hive Driver
      getTableSchema JDBC Metastore API
      doesTableExist Metastore API Metastore API
      getLastCommitTimeSynced Metastore API Metastore API

      bschell and I developed several Metastore API implementation for createTable, addPartitionsToTableupdatePartitionsToTableupdateTableDefinition }}{{which will be helpful for several issues: e.g. resolving null partition hive sync issue and supporting ALTER_TABLE cascade with AWS glue catalog{{. }}

      But it seems hard to organize three implementations within the current config. So we plan to separate HoodieHiveClient into three classes:

      1. HoodieHiveClient which implements all the APIs through Metastore API.
      2. HoodieHiveJDBCClient which extends from HoodieHiveClient and overwrite several the APIs through Hive JDBC.
      3. HoodieHiveDriverClient which extends from HoodieHiveClient and overwrite several the APIs through Hive Driver.

      And we introduce a new parameter hoodie.datasource.hive_sync.hive_client_class which could* * let you choose which Hive Client class to use.

      {{}}

       

       

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              wenningd Wenning Ding
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: