Details
-
Task
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
Description
Currently there are three ways in HoodieHiveClient to perform Hive functionalities. One is through Hive JDBC, one is through Hive Metastore API. One is through Hive Driver.
There’s a parameter called hoodie.datasource.hive_sync.use_jdbc to control whether use Hive JDBC or not. However, this parameter does not accurately describe the situation.
Basically, current logic is when set use_jdbc to true, most of the methods in HoodieHiveClient will use JDBC, and few methods in HoodieHiveClient will use Hive Metastore API.
When set use_jdbc to false, most of the methods in HoodieHiveClient will use Hive Driver, and few methods in HoodieHiveClient will use Hive Metastore API.
Here is a table shows that what will actually be used when setting use_jdbc to ture/false.
Method | use_jdbc=true | use_jdbc=false |
addPartitionsToTable | JDBC | Hive Driver |
updatePartitionsToTable | JDBC | Hive Driver |
scanTablePartitions | Metastore API | Metastore API |
updateTableDefinition | JDBC | Hive Driver |
createTable | JDBC | Hive Driver |
getTableSchema | JDBC | Metastore API |
doesTableExist | Metastore API | Metastore API |
getLastCommitTimeSynced | Metastore API | Metastore API |
bschell and I developed several Metastore API implementation for createTable, addPartitionsToTable, updatePartitionsToTable, updateTableDefinition }}{{which will be helpful for several issues: e.g. resolving null partition hive sync issue and supporting ALTER_TABLE cascade with AWS glue catalog{{. }}
But it seems hard to organize three implementations within the current config. So we plan to separate HoodieHiveClient into three classes:
- HoodieHiveClient which implements all the APIs through Metastore API.
- HoodieHiveJDBCClient which extends from HoodieHiveClient and overwrite several the APIs through Hive JDBC.
- HoodieHiveDriverClient which extends from HoodieHiveClient and overwrite several the APIs through Hive Driver.
And we introduce a new parameter hoodie.datasource.hive_sync.hive_client_class which could* * let you choose which Hive Client class to use.
{{}}
Attachments
Issue Links
- links to