[HIVE-18301] Investigate to enable MapInput cache in Hive on Spark - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Patch Available
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: None
Labels:
None

Description

Before IOContext problem is found in MapTran when spark rdd cache is enabled in ~~HIVE-8920~~.
so we disabled rdd cache in MapTran at SparkPlanGenerator. The problem is IOContext seems not initialized correctly in the spark yarn client/cluster mode and caused the exception like

Job aborted due to stage failure: Task 93 in stage 0.0 failed 4 times, most recent failure: Lost task 93.3 in stage 0.0 (TID 616, bdpe48): java.lang.RuntimeException: Error processing row: java.lang.NullPointerException
	at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.processRow(SparkMapRecordHandler.java:165)
	at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:48)
	at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:27)
	at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList.hasNext(HiveBaseFunctionResultList.java:85)
	at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:42)
	at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47)
	at org.apache.spark.scheduler.Task.run(Task.scala:85)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NullPointerException
	at org.apache.hadoop.hive.ql.exec.AbstractMapOperator.getNominalPath(AbstractMapOperator.java:101)
	at org.apache.hadoop.hive.ql.exec.MapOperator.cleanUpInputFileChangedOp(MapOperator.java:516)
	at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1187)
	at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:546)
	at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.processRow(SparkMapRecordHandler.java:152)
	... 12 more

Driver stacktrace:

in yarn client/cluster mode, sometimes ExecMapperContext#currentInputPath is null when rdd cach is enabled.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HIVE-18301.3.patch
02/Feb/18 09:55
67 kB
liyunzhang
HIVE-18301.2.patch
31/Jan/18 22:27
55 kB
liyunzhang
HIVE-18301.1.patch
26/Jan/18 08:24
63 kB
liyunzhang
HIVE-18301.patch
24/Jan/18 05:37
51 kB
liyunzhang

Issue Links

blocks

HIVE-17486 Enable SharedWorkOptimizer in tez on HOS

Patch Available

Activity

People

Assignee:: liyunzhang

Reporter:: liyunzhang

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 19/Dec/17 05:26

Updated:: 02/Feb/18 09:59