Description
RDD operations on results of the Pyspark Cartesian method return Py4JException.
Here's a few examples
$ bin/pyspark
>>> rdd1=sc.parallelize([1,2,3,4,5,1]) >>> rdd2=sc.parallelize([11,12,13,14,15,11]) >>> rdd1.cartesian(rdd2).map(lambda x: x[0] + x[1]).collect() Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/Users/tshinagawa/Documents/Spark/RCs/spark-0.9.0-incubating-rc2/python/pyspark/rdd.py", line 446, in collect bytesInJava = self._jrdd.collect().iterator() File "/Users/tshinagawa/Documents/Spark/RCs/spark-0.9.0-incubating-rc2/python/pyspark/rdd.py", line 1041, in _jrdd class_tag) File "/Users/tshinagawa/Documents/Spark/RCs/spark-0.9.0-incubating-rc2/python/lib/py4j-0.8.1-src.zip/py4j/java_gateway.py", line 669, in __call__ File "/Users/tshinagawa/Documents/Spark/RCs/spark-0.9.0-incubating-rc2/python/lib/py4j-0.8.1-src.zip/py4j/protocol.py", line 304, in get_return_value py4j.protocol.Py4JError: An error occurred while calling None.org.apache.spark.api.python.PythonRDD. Trace: py4j.Py4JException: Constructor org.apache.spark.api.python.PythonRDD([class org.apache.spark.rdd.CartesianRDD, class [B, class java.util.HashMap, class java.util.ArrayList, class java.lang.Boolean, class java.lang.String, class java.util.ArrayList, class org.apache.spark.Accumulator, class scala.reflect.ManifestFactory$$anon$2]) does not exist at py4j.reflection.ReflectionEngine.getConstructor(ReflectionEngine.java:184) at py4j.reflection.ReflectionEngine.getConstructor(ReflectionEngine.java:202) at py4j.Gateway.invoke(Gateway.java:213) at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79) at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68) at py4j.GatewayConnection.run(GatewayConnection.java:207) at java.lang.Thread.run(Thread.java:744) >>> >>> rdd1.cartesian(rdd2).count() Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/Users/tshinagawa/Documents/Spark/RCs/spark-0.9.0-incubating-rc2/python/pyspark/rdd.py", line 525, in count return self.mapPartitions(lambda i: [sum(1 for _ in i)]).sum() File "/Users/tshinagawa/Documents/Spark/RCs/spark-0.9.0-incubating-rc2/python/pyspark/rdd.py", line 516, in sum return self.mapPartitions(lambda x: [sum(x)]).reduce(operator.add) File "/Users/tshinagawa/Documents/Spark/RCs/spark-0.9.0-incubating-rc2/python/pyspark/rdd.py", line 482, in reduce vals = self.mapPartitions(func).collect() File "/Users/tshinagawa/Documents/Spark/RCs/spark-0.9.0-incubating-rc2/python/pyspark/rdd.py", line 446, in collect bytesInJava = self._jrdd.collect().iterator() File "/Users/tshinagawa/Documents/Spark/RCs/spark-0.9.0-incubating-rc2/python/pyspark/rdd.py", line 1041, in _jrdd class_tag) File "/Users/tshinagawa/Documents/Spark/RCs/spark-0.9.0-incubating-rc2/python/lib/py4j-0.8.1-src.zip/py4j/java_gateway.py", line 669, in __call__ File "/Users/tshinagawa/Documents/Spark/RCs/spark-0.9.0-incubating-rc2/python/lib/py4j-0.8.1-src.zip/py4j/protocol.py", line 304, in get_return_value py4j.protocol.Py4JError: An error occurred while calling None.org.apache.spark.api.python.PythonRDD. Trace: py4j.Py4JException: Constructor org.apache.spark.api.python.PythonRDD([class org.apache.spark.rdd.CartesianRDD, class [B, class java.util.HashMap, class java.util.ArrayList, class java.lang.Boolean, class java.lang.String, class java.util.ArrayList, class org.apache.spark.Accumulator, class scala.reflect.ManifestFactory$$anon$2]) does not exist at py4j.reflection.ReflectionEngine.getConstructor(ReflectionEngine.java:184) at py4j.reflection.ReflectionEngine.getConstructor(ReflectionEngine.java:202) at py4j.Gateway.invoke(Gateway.java:213) at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79) at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68) at py4j.GatewayConnection.run(GatewayConnection.java:207) at java.lang.Thread.run(Thread.java:744)
I see this issue after the custom serializer change.
https://github.com/apache/incubator-spark/commit/cbb7f04aef2220ece93dea9f3fa98b5db5f270d6
Attachments
Issue Links
- is related to
-
SPARK-2601 py4j.Py4JException on sc.pickleFile
- Resolved