Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
2.0.0, 2.3.0
-
None
Description
Steps to reproduce:
from pyspark.streaming import StreamingContext ssc = StreamingContext(spark.sparkContext, 10) (ssc .queueStream([sc.range(10)]) .transform(lambda rdd: rdd.cartesian(rdd)) .pprint()) ssc.start() ## 16/10/01 21:34:30 ERROR JobScheduler: Error generating jobs for time 1475350470000 ms ## java.lang.ClassCastException: org.apache.spark.api.java.JavaPairRDD ## cannot be cast to org.apache.spark.api.java.JavaRDD ## at com.sun.proxy.$Proxy15.call(Unknown Source) ## ....
A dummy fix is to put map(lamba x: x) which suggests it is a problem similar to https://issues.apache.org/jira/browse/SPARK-16589
Attachments
Issue Links
- relates to
-
SPARK-16589 Chained cartesian produces incorrect number of records
- Resolved
- links to