[HIVE-10086] Hive throws error when accessing Parquet file schema using field name match - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 1.0.0
Fix Version/s: 1.2.0
Component/s: None
Labels:
None

Description

When Hive table schema contains a portion of the schema of a Parquet file, then the access to the values should work if the field names match the schema. This does not work when a struct<> data type is in the schema, and the Hive schema contains just a portion of the struct elements. Hive throws an error instead.

This is the example and how to reproduce:

First, create a parquet table, and add some values on it:

CREATE TABLE test1 (id int, name string, address struct<number:int,street:string,zip:string>) STORED AS PARQUET;

INSERT INTO TABLE test1 SELECT 1, 'Roger', named_struct('number',8600,'street','Congress Ave.','zip','87366') FROM srcpart LIMIT 1;

Note: srcpart could be any table. It is just used to leverage the INSERT statement.

The above table example generates the following Parquet file schema:

message hive_schema {
  optional int32 id;
  optional binary name (UTF8);
  optional group address {
    optional int32 number;
    optional binary street (UTF8);
    optional binary zip (UTF8);
  }
}

Afterwards, I create a table that contains just a portion of the schema, and load the Parquet file generated above, a query will fail on that table:

CREATE TABLE test1 (name string, address struct<street:string>) STORED AS PARQUET;

LOAD DATA LOCAL INPATH '/tmp/HiveGroup.parquet' OVERWRITE INTO TABLE test1;

hive> SELECT name FROM test1;
OK
Roger
Time taken: 0.071 seconds, Fetched: 1 row(s)

hive> SELECT address FROM test1;
OK
Failed with exception java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.UnsupportedOperationException: Cannot inspect org.apache.hadoop.io.IntWritable
Time taken: 0.085 seconds

I would expect that Parquet can access the matched names, but Hive throws an error instead.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HIVE-10086.5.patch
27/Mar/15 18:37
26 kB
Sergio Peña
HiveGroup.parquet
25/Mar/15 22:37
0.7 kB
Sergio Peña

Issue Links

relates to

HIVE-10135 Add qtest to access struct<> data type with parquet format after parquet column index access enabled

Closed

links to

Review Board

Activity

People

Assignee:: Sergio Peña

Reporter:: Sergio Peña

Votes:: 1 Vote for this issue

Watchers:: 9 Start watching this issue

Dates

Created:: 25/Mar/15 16:31

Updated:: 18/May/15 19:52

Resolved:: 30/Mar/15 00:07