Details
-
Sub-task
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
Description
Several ComputeSpec test cases failed on my cluster:
ComputeSpec_5 - ComputeSpec_13
These scripts have a ship() part in the define, where the ship includes the script file too, so we add the same file to spark context twice. This is not a problem with Hadoop, but looks like Spark doesn't like adding the same filename twice:
Caused by: java.lang.IllegalArgumentException: requirement failed: File PigStreamingDepend.pl already registered. at scala.Predef$.require(Predef.scala:233) at org.apache.spark.rpc.netty.NettyStreamManager.addFile(NettyStreamManager.scala:69) at org.apache.spark.SparkContext.addFile(SparkContext.scala:1386) at org.apache.spark.SparkContext.addFile(SparkContext.scala:1348) at org.apache.spark.api.java.JavaSparkContext.addFile(JavaSparkContext.scala:662) at org.apache.pig.backend.hadoop.executionengine.spark.SparkLauncher.addResourceToSparkJobWorkingDirectory(SparkLauncher.java:462) at org.apache.pig.backend.hadoop.executionengine.spark.SparkLauncher.shipFiles(SparkLauncher.java:371) at org.apache.pig.backend.hadoop.executionengine.spark.SparkLauncher.addFilesToSparkJob(SparkLauncher.java:357) at org.apache.pig.backend.hadoop.executionengine.spark.SparkLauncher.uploadResources(SparkLauncher.java:235) at org.apache.pig.backend.hadoop.executionengine.spark.SparkLauncher.launchPig(SparkLauncher.java:222) at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:290)