[PIG-4228] SchemaTupleBackend error when working on a Spark 1.1.0 cluster - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 0.14.0
Fix Version/s: spark-branch
Component/s: spark
Labels:
- spark
Environment:

spark-1.1.0

Description

Whenever I try to run a script on a Spark cluster, I get the following error:

ERROR 0: org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in stage 1.0 failed 4 times, most recent failure: Lost task 2.3 in stage 1.0 (...): java.lang.RuntimeException: org.apache.pig.backend.executionengine.ExecException: ERROR 0: Error while executing ForEach at [1-2[-1,-1]]
org.apache.pig.backend.hadoop.executionengine.spark.converter.POOutputConsumerIterator.readNext(POOutputConsumerIterator.java:62)
org.apache.pig.backend.hadoop.executionengine.spark.converter.POOutputConsumerIterator.hasNext(POOutputConsumerIterator.java:68)
scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:388)
scala.collection.convert.Wrappers$IteratorWrapper.hasNext(Wrappers.scala:29)
org.apache.pig.backend.hadoop.executionengine.spark.converter.POOutputConsumerIterator.readNext(POOutputConsumerIterator.java:34)
org.apache.pig.backend.hadoop.executionengine.spark.converter.POOutputConsumerIterator.hasNext(POOutputConsumerIterator.java:68)
scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
scala.collection.Iterator$class.foreach(Iterator.scala:727)
scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:65)
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
org.apache.spark.scheduler.Task.run(Task.scala:54)
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:745)

After debugging I have seen that the problem is inside SchemaTupleBackend. Although SparkLauncher initializes this class, when the job gets sent to the executors this is lost and when POOutputConsumerIterator tries to fetch the results, SchemaTupleBackend.newSchemaTupleFactory(...) is called, throwing a RuntimeException.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

groupby.pig
28/Oct/14 07:53
0.3 kB
liyunzhang
movies_data.csv
28/Oct/14 07:53
0.3 kB
liyunzhang

Issue Links

relates to

PIG-4232 UDFContext is not initialized in executors when running on Spark cluster

Closed

Activity

People

Assignee:: Unassigned

Reporter:: Carlos Balduz

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 10/Oct/14 10:19

Updated:: 24/Feb/15 18:01