[HIVE-9502] Parquet cannot read Map types from files written with Hive <= 0.12 - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 0.14.0
Fix Version/s: 1.1.0
Component/s: None
Labels:
None

Description

When reading a Parquet file written by Hive <= 0.12, the following error is thrown:

Caused by: java.lang.ArrayIndexOutOfBoundsException: 1
        at org.apache.hadoop.hive.ql.io.parquet.serde.AbstractParquetMapInspector.getMap(AbstractParquetMapInspector.java:73)
        at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:519)
        at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serializeField(LazySimpleSerDe.java:443)
        at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:427)
        at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:582)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796)
        at org.apache.hadoop.hive.ql.exec.LimitOperator.processOp(LimitOperator.java:51)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796)
        at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:87)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796)
        at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796)
        at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:539)
        ... 9 more

This is because old versions of Hive (<= 0.12) write Map types using the following schema:

optional group m1 (MAP_KEY_VALUE) {
	repeated group map {
		required binary key;
		optional binary key;
	}
}

~~PARQUET-113~~ mentions new annotations for Parquet nested types.
https://github.com/rdblue/incubator-parquet-format/blob/PARQUET-113-add-list-and-map-spec/LogicalTypes.md#maps

And now the correct schema is:

optional group m1f (MAP) {
	repeated group map (MAP_KEY_VALUE) {
		required binary key;
		optional binary key;
	}
}

We should be backwards compatible to the old schema as well.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

alltypesparquet
28/Jan/15 23:15
1 kB
Sergio Peña
HIVE-9502.1.patch
28/Jan/15 22:25
0.9 kB
Sergio Peña
HIVE-9502.2.patch
28/Jan/15 22:43
4 kB
Sergio Peña
HIVE-9502.3.patch
29/Jan/15 20:08
5 kB
Sergio Peña
HIVE-9502.4.patch
29/Jan/15 20:30
4 kB
Sergio Peña

Issue Links

is broken by

HIVE-8909 Hive doesn't correctly read Parquet nested types

Resolved

Activity

People

Assignee:: Sergio Peña

Reporter:: Sergio Peña

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 28/Jan/15 22:21

Updated:: 30/Jan/15 20:50

Resolved:: 30/Jan/15 20:49