[IMPALA-886] Always display HBase cols in same order as CREATE TABLE statement - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: Impala 1.3
Fix Version/s: Impala 4.2.0
Component/s: Catalog
Labels:

Target Version:

Product Backlog

Description

I noticed a discrepancy with Hive, in how Impala handles column order for HBase tables.
I think it would be preferable to use the same behavior as Hive, otherwise life becomes
more complicated for anyone doing INSERT or SELECT * with an HBase table through Impala.
(And I have to add caveats and usage notes in the docs.)

Repro:

In HBase shell, create a table with a single column family. I think most Impala tests use 1 column family per column, where you won't notice this behavior.

hbase(main):008:0> create 'sample_data_fast','cols'
0 row(s) in 71.8750 seconds

In Hive shell, create a mapping table. Notice how DESCRIBE repeats back the columns in the same order as in CREATE TABLE.

hive> create external table sample_data_fast (id string, val int, zfill string, name string, assertion boolean)
> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
> WITH SERDEPROPERTIES (
> "hbase.columns.mapping" =
> ":key,cols:val,cols:zfill,cols:name,cols:assertion")
> TBLPROPERTIES("hbase.table.name" = "sample_data_fast")
> ;
OK
Time taken: 1.7 seconds
hive> desc sample_data_fast;
OK
id string from deserializer
val int from deserializer
zfill string from deserializer
name string from deserializer
assertion boolean from deserializer
Time taken: 0.302 seconds

Now try the same DESCRIBE in impala-shell. The key column (id) is listed first. Then all the other columns, part of the same column family, are listed in alphabetical order rather than the order from CREATE TABLE:

[localhost:21000] > desc sample_data_fast;
Query: describe sample_data_fast
-------------------------

name

type

comment

-------------------------

id	string
assertion	boolean
name	string
val	int
zfill	string

-------------------------
Returned 5 row(s) in 0.02s

Thus if you already had Hive code that was doing SELECT * from an HBase table like this, you would get a different result set (different column order) in Impala.
If you tried to copy from an HDFS table via 'INSERT INTO hbase_table SELECT * FROM hdfs_table', you would get an error because the columns don't match. If you made a separate column family for each column, the discrepancy is masked because you need more than one column per column family to experience the alphabetical ordering.

Since Hive is preserving the column order, the relevant info must be there in the metastore.

Attachments

Activity

People

Assignee:: Csaba Ringhofer

Reporter:: John Russell

Votes:: 1 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 14/Mar/14 23:02

Updated:: 01/Dec/22 06:38

Resolved:: 01/Dec/22 06:38