[HIVE-6198] ORC file and struct column names are case sensitive - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Duplicate
Affects Version/s: 0.11.0, 0.12.0
Fix Version/s: 1.0.0
Component/s: CLI, File Formats
Labels:
None

Description

HiveQL document states that the "Table names and column names are case insensitive". But the struct behavior for ORC file is different.
Consider a sample text file:

$ cat data.txt
line1|key11:value11,key12:value12,key13:value13|a,b,c|one,two
line2|key21:value21,key22:value22,key23:value23|d,e,f|three,four
line3|key31:value31,key32:value32,key33:value33|g,h,i|five,six

Creating a table stored as txt and then using this to create a table stored as orc

CREATE TABLE orig (
  str STRING,
  mp  MAP<STRING,STRING>,
  lst ARRAY<STRING>,
  strct STRUCT<A:STRING,B:STRING>
) ROW FORMAT DELIMITED
    FIELDS TERMINATED BY '|'
    COLLECTION ITEMS TERMINATED BY ','
    MAP KEYS TERMINATED BY ':';
LOAD DATA LOCAL INPATH 'data.txt' INTO TABLE orig;

CREATE TABLE tableorc (
  str STRING,
  mp  MAP<STRING,STRING>,
  lst ARRAY<STRING>,
  strct STRUCT<A:STRING,B:STRING>
) STORED AS ORC;
INSERT OVERWRITE TABLE tableorc SELECT * FROM orig;

Suppose we project columns or read the strct columns for both table types, here are the results. I have also tested the same with RC. The behavior is similar to txt files.

hive> SELECT * FROM orig;
line1   {"key11":"value11","key12":"value12","key13":"value13"} ["a","b","c"]  
{"a":"one","b":"two"}
line2   {"key21":"value21","key22":"value22","key23":"value23"} ["d","e","f"]  
{"a":"three","b":"four"}
line3   {"key31":"value31","key32":"value32","key33":"value33"} ["g","h","i"]  
{"a":"five","b":"six"}
Time taken: 0.126 seconds, Fetched: 3 row(s)

hive> SELECT * FROM tableorc;
line1   {"key12":"value12","key11":"value11","key13":"value13"} ["a","b","c"]  
{"A":"one","B":"two"}
line2   {"key21":"value21","key23":"value23","key22":"value22"} ["d","e","f"]  
{"A":"three","B":"four"}
line3   {"key33":"value33","key31":"value31","key32":"value32"} ["g","h","i"]  
{"A":"five","B":"six"}
Time taken: 0.178 seconds, Fetched: 3 row(s)

hive> SELECT strct FROM tableorc;
{"a":"one","b":"two"}
{"a":"three","b":"four"}
{"a":"five","b":"six"}

hive>SELECT strct.A FROM orig;
one
three
five

hive>SELECT strct.a FROM orig;
one
three
five

hive>SELECT strct.A FROM tableorc;
one
three
five

hive>SELECT strct.a FROM tableorc;
FAILED: Execution Error, return code 2 from
org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched: 
Job 0: Map: 1   HDFS Read: 0 HDFS Write: 0 FAIL

So it seems that ORC behaves differently for struct columns. Also why are we storing the column names for struct for the other types as CASE SENSITIVE? What is the standard for Hive QL with respect to structs?

Regards
Viraj

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HIVE-6198.3.patch.txt
07/Oct/14 08:45
23 kB
Navis Ryu
HIVE-6198.2.patch.txt
09/Jul/14 09:09
5 kB
Navis Ryu
HIVE-6198.1.patch.txt
08/Jul/14 08:59
3 kB
Navis Ryu

Issue Links

is duplicated by

HIVE-8870 errors when selecting a struct field within an array from ORC based tables

Closed

Activity

People

Assignee:: Navis Ryu

Reporter:: Viraj Bhat

Votes:: 0 Vote for this issue

Watchers:: 9 Start watching this issue

Dates

Created:: 14/Jan/14 20:45

Updated:: 19/Feb/15 14:44

Resolved:: 15/Dec/14 03:54