[HIVE-13300] Hive on spark throws exception for multi-insert with join - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 2.0.0
Fix Version/s: 2.1.0
Component/s: Spark
Labels:
None

Description

For certain multi-insert queries, Hive on Spark throws a deserialization error.

create table status_updates(userid int,status string,ds string);
create table profiles(userid int,school string,gender int);
drop table school_summary; create table school_summary(school string,cnt int) partitioned by (ds string);
drop table gender_summary; create table gender_summary(gender int,cnt int) partitioned by (ds string);

insert into status_updates values (1, "status_1", "2016-03-16");
insert into profiles values (1, "school_1", 0);

set hive.auto.convert.join=false;
set hive.execution.engine=spark;

FROM (SELECT a.status, b.school, b.gender
FROM status_updates a JOIN profiles b
ON (a.userid = b.userid and
a.ds='2009-03-20' )
) subq1
INSERT OVERWRITE TABLE gender_summary
PARTITION(ds='2009-03-20')
SELECT subq1.gender, COUNT(1) GROUP BY subq1.gender
INSERT OVERWRITE TABLE school_summary
PARTITION(ds='2009-03-20')
SELECT subq1.school, COUNT(1) GROUP BY subq1.school

Error:

16/03/17 13:29:00 [task-result-getter-3]: WARN scheduler.TaskSetManager: Lost task 0.0 in stage 2.0 (TID 3, localhost): java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error: Unable to deserialize reduce input key from x1x128x0x0 with properties {serialization.sort.order.null=a, columns=reducesinkkey0, serialization.lib=org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe, serialization.sort.order=+, columns.types=int}
	at org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:279)
	at org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:49)
	at org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28)
	at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:95)
	at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
	at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:126)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
	at org.apache.spark.scheduler.Task.run(Task.scala:89)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:724)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error: Unable to deserialize reduce input key from x1x128x0x0 with properties {serialization.sort.order.null=a, columns=reducesinkkey0, serialization.lib=org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe, serialization.sort.order=+, columns.types=int}
	at org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:251)
	... 12 more
Caused by: org.apache.hadoop.hive.serde2.SerDeException: java.io.EOFException
	at org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.deserialize(BinarySortableSerDe.java:241)
	at org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:249)
	... 12 more
Caused by: java.io.EOFException
	at org.apache.hadoop.hive.serde2.binarysortable.InputByteBuffer.read(InputByteBuffer.java:54)
	at org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.deserializeInt(BinarySortableSerDe.java:597)
	at org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.deserialize(BinarySortableSerDe.java:288)
	at org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.deserialize(BinarySortableSerDe.java:237)
	... 13 more

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HIVE-13300.2.patch
21/Mar/16 08:01
15 kB
Szehon Ho
HIVE-13300.3.patch
22/Mar/16 05:14
16 kB
Szehon Ho
HIVE-13300.patch
18/Mar/16 02:25
9 kB
Szehon Ho

Activity

People

Assignee:: Szehon Ho

Reporter:: Szehon Ho

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 17/Mar/16 20:37

Updated:: 21/Jun/16 15:43

Resolved:: 24/Mar/16 18:11