Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Duplicate
-
Impala 2.1.1
-
None
Description
If you run a "show create table" on an HBase table in Impala, the column names are displayed in a different order than in Hive. This is a problem if you run a show create table from Impala, and then run the create table command in Hive, because the ordering of the columns is very important, as it needs to align with the "hbase.columns.mapping" serde property.
Example.
Correct (hive)
CREATE EXTERNAL TABLE `jira_2`( `hbasekey` string COMMENT 'from deserializer', `project` string COMMENT 'from deserializer', `issueid` string COMMENT 'from deserializer', `title` string COMMENT 'from deserializer', `summary` string COMMENT 'from deserializer', `createdts` bigint COMMENT 'from deserializer', `updatedts` bigint COMMENT 'from deserializer', `issuetype` string COMMENT 'from deserializer', `priority` string COMMENT 'from deserializer', `resolution` string COMMENT 'from deserializer', `affectsversion` string COMMENT 'from deserializer', `fixversion` string COMMENT 'from deserializer', `component` string COMMENT 'from deserializer', `clouderaflags` string COMMENT 'from deserializer', `status` string COMMENT 'from deserializer', `assignee` string COMMENT 'from deserializer', `reporter` string COMMENT 'from deserializer', `labels` string COMMENT 'from deserializer') ROW FORMAT SERDE 'org.apache.hadoop.hive.hbase.HBaseSerDe' STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ( 'serialization.format'='1', 'hbase.columns.mapping'=':key, rep:project, rep:issue_id, content:title, content:summary, meta:created#b, meta:updated#b, rep:type, rep:priority, rep:resolution, meta:version, meta:fix_version, meta:component, meta:cloudera_flags, rep:status, rep:assignee, rep:reporter, meta:labelsStr') LOCATION 'hdfs://nameservice1/user/hive/warehouse/jira' TBLPROPERTIES ( 'hbase.table.name'='jira_ticket', 'transient_lastDdlTime'='1405092795')
Incorrect (Impala)
CREATE EXTERNAL TABLE default.jira_2 ( hbasekey STRING, summary STRING, title STRING, clouderaflags STRING, component STRING, createdts BIGINT, fixversion STRING, updatedts BIGINT, affectsversion STRING, assignee STRING, issueid STRING, priority STRING, project STRING, reporter STRING, resolution STRING, status STRING, issuetype STRING, labels STRING ) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ('hbase.columns.mapping'=':key, rep:project, rep:issue_id, content:title, content:summary, meta:created#b, meta:updated#b, rep:type, rep:priority, rep:resolution, meta:version, meta:fix_version, meta:component, meta:cloudera_flags, rep:status, rep:assignee, rep:reporter, meta:labelsStr', 'serialization.format'='1') TBLPROPERTIES ('hbase.table.name'='jira_ticket', 'transient_lastDdlTime'='1405092795', 'storage_handler'='org.apache.hadoop.hive.hbase.HBaseStorageHandler')
The workaround is to simply not use Impala show create table for HBase tables, but we should probably get this fixed at some point.