Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Duplicate
-
None
-
None
-
None
Description
There are some errors that happen if spark.master is not set.
This is despite the code defaulting to yarn-cluster if spark.master is not set by user or on the config files: https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClientFactory.java#L51
The funny thing is that while it works the first time due to this default, subsequent tries will fail as the hiveConf is refreshed without that default being set.
Exception is follows:
Job aborted due to stage failure: Task 40 in stage 1.0 failed 4 times, most recent failure: Lost task 40.3 in stage 1.0 (TID 22, d2409.halxg.cloudera.com): java.lang.RuntimeException: Error processing row: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.processRow(SparkMapRecordHandler.java:154) at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:48) at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:27) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:95) at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$15.apply(AsyncRDDActions.scala:120) at org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$15.apply(AsyncRDDActions.scala:120) at org.apache.spark.SparkContext$$anonfun$38.apply(SparkContext.scala:2003) at org.apache.spark.SparkContext$$anonfun$38.apply(SparkContext.scala:2003) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:89) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.spark.HashTableLoader.load(HashTableLoader.java:117) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:197) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.cleanUpInputFileChangedOp(MapJoinOperator.java:223) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1051) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:490) at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.processRow(SparkMapRecordHandler.java:141) ... 16 more Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.spark.SparkUtilities.isDedicatedCluster(SparkUtilities.java:108) at org.apache.hadoop.hive.ql.exec.spark.HashTableLoader.load(HashTableLoader.java:124) at org.apache.hadoop.hive.ql.exec.spark.HashTableLoader.load(HashTableLoader.java:114) ... 24 more Driver stacktrace:
Attachments
Issue Links
- duplicates
-
HIVE-12616 NullPointerException when spark session is reused to run a mapjoin
- Closed