Description
When Kudu HMS integration is enabled there are several missing fields when creating a table via query "stored as kudu table" on Impala from hive. This results in ClassNotFound error when trying to query the table from Hive after creating the table:
ERROR : Failed
org.apache.hadoop.hive.metastore.api.MetaException: java.lang.ClassNotFoundException Class not found
at org.apache.hadoop.hive.metastore.HiveMetaStoreUtils.getDeserializer(HiveMetaStoreUtils.java:98) ~[hive-exec-3.1.3000.7.1.7.1000-141.jar:3.1.3000.7.1.7.1000-141]
at org.apache.hadoop.hive.metastore.HiveMetaStoreUtils.getDeserializer(HiveMetaStoreUtils.java:77) ~[hive-exec-3.1.3000.7.1.7.1000-141.jar:3.1.3000.7.1.7.1000-141]
at org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:331) ~[hive-exec-3.1.3000.7.1.7.1000-141.jar:3.1.3000.7.1.7.1000-141]
When running a following sample query in Impala to create a kudu table with Kudu HMS integration enabled the table gets created with the InputFormat, OutputFormat and SerDe Library fields are missing
create table default.kudu_test ( col1 string comment 'col1', col2 string comment 'col2', primary key (col1) ) comment 'kudu_test' stored as kudu;
SerDe Library: | NULL | |
InputFormat: | NULL | |
OutputFormat: | NULL |
Hive Metastore log for the table creation:
INFO org.apache.hadoop.hive.metastore.HiveMetaStore: [pool-5-thread-124]: 134: source:172.25.35.0 create_table: Table(tableName:kudu_test, dbName:default, owner:root, createTime:0, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:col1, type:string, comment:col1), FieldSchema(name:col2, type:string, comment:col2)], location:, inputFormat:, outputFormat:, compressed:false, numBuckets:0, serdeInfo:SerDeInfo(name:, serializationLib:, parameters:{}), bucketCols:[], sortCols:[], parameters:{}), partitionKeys:[], parameters:
, viewOriginalText:, viewExpandedText:, tableType:MANAGED_TABLE, temporary:false, ownerType:USER)
Running the same query in Impala with Kudu HMS Integration disabled on the other hand has these fields populated when the table is created:
SerDe Library: | org.apache.hadoop.hive.kudu.KuduSerDe | NULL |
InputFormat: | org.apache.hadoop.hive.kudu.KuduInputFormat | NULL |
OutputFormat: | org.apache.hadoop.hive.kudu.KuduOutputFormat | NULL |
Hive Metastore log for table creation:
NFO org.apache.hadoop.hive.metastore.HiveMetaStore: [pool-5-thread-173]: 183: source:172.25.35.0 create_table_req: Table(tableName:kudu_test, dbName:default, owner:root, createTime:0, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:col1, type:string, comment:col1), FieldSchema(name:col2, type:string, comment:col2)], location:null, inputFormat:org.apache.hadoop.hive.kudu.KuduInputFormat, outputFormat:org.apache.hadoop.hive.kudu.KuduOutputFormat, compressed:false, numBuckets:0, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.kudu.KuduSerDe, parameters:{}), bucketCols:[], sortCols:[], parameters:null), partitionKeys:[], parameters:
, viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE, catName:hive, ownerType:USER, accessType:8)
--------------------------------
Code path for table creation when Kudu HMS integration enabled(Kudu Codepath):
Quick recap of steps when creating a kudu table:
HMSCatalog::CreateTable() —> hive::Table declared and passed to PopulateTable(… , &table) -> Thirft client Execute call —> HMSClient::CreateTable(Table(one that just got populated), envcontext(default)) -> hms_client.create_table_with_environment_context(table, envcontext).
CreateTable
https://github.com/apache/kudu/blob/master/src/kudu/hms/hms_catalog.cc#L146 ->
Populate the fields of table
https://github.com/apache/kudu/blob/master/src/kudu/hms/hms_catalog.cc#L367
Hms client call
https://github.com/apache/kudu/blob/master/src/kudu/hms/hms_client.cc#L280
-----------------------------
Code path for table creation when Kudu HMS integration is disabled(Impala Codepath):
CreateTable -> CreateMetaStoreTable
->line 3248 tbl.setSd(createSd(params));
CreateSd
Checking the code paths its observable that the missing fields are filled via CreateSd with default values for the table getting created without Kudu HMS integration(Through Impala).
These fields are untouched when Kudu HMS integration is enabled and table is getting created(Kudu code path).