Details
-
Sub-task
-
Status: Open
-
Major
-
Resolution: Unresolved
-
0.14.0
-
spark-1.1.0
Description
Whenever I try to run a script on a Spark cluster, I get the following error:
ERROR 0: org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in stage 1.0 failed 4 times, most recent failure: Lost task 2.3 in stage 1.0 (...): java.lang.RuntimeException: org.apache.pig.backend.executionengine.ExecException: ERROR 0: Error while executing ForEach at [1-2[-1,-1]]
org.apache.pig.backend.hadoop.executionengine.spark.converter.POOutputConsumerIterator.readNext(POOutputConsumerIterator.java:62)
org.apache.pig.backend.hadoop.executionengine.spark.converter.POOutputConsumerIterator.hasNext(POOutputConsumerIterator.java:68)
scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:388)
scala.collection.convert.Wrappers$IteratorWrapper.hasNext(Wrappers.scala:29)
org.apache.pig.backend.hadoop.executionengine.spark.converter.POOutputConsumerIterator.readNext(POOutputConsumerIterator.java:34)
org.apache.pig.backend.hadoop.executionengine.spark.converter.POOutputConsumerIterator.hasNext(POOutputConsumerIterator.java:68)
scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
scala.collection.Iterator$class.foreach(Iterator.scala:727)
scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:65)
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
org.apache.spark.scheduler.Task.run(Task.scala:54)
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:745)
After debugging I have seen that the problem is inside SchemaTupleBackend. Although SparkLauncher initializes this class, when the job gets sent to the executors this is lost and when POOutputConsumerIterator tries to fetch the results, SchemaTupleBackend.newSchemaTupleFactory(...) is called, throwing a RuntimeException.
Attachments
Attachments
Issue Links
- relates to
-
PIG-4232 UDFContext is not initialized in executors when running on Spark cluster
- Closed