Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-17756

java.lang.ClassCastException when using cartesian with DStream.transform

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.0.0, 2.3.0
    • Fix Version/s: 2.4.0
    • Component/s: DStreams, PySpark
    • Labels:
      None

      Description

      Steps to reproduce:

      from pyspark.streaming import StreamingContext
      
      ssc = StreamingContext(spark.sparkContext, 10)
      (ssc
          .queueStream([sc.range(10)])
          .transform(lambda rdd: rdd.cartesian(rdd))
          .pprint())
      
      ssc.start()
      
      ## 16/10/01 21:34:30 ERROR JobScheduler: Error generating jobs for time 1475350470000 ms
      ## java.lang.ClassCastException: org.apache.spark.api.java.JavaPairRDD ## cannot be cast to org.apache.spark.api.java.JavaRDD
      ## 	at com.sun.proxy.$Proxy15.call(Unknown Source)
      ##    ....
      

      A dummy fix is to put map(lamba x: x) which suggests it is a problem similar to https://issues.apache.org/jira/browse/SPARK-16589

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                hyukjin.kwon Hyukjin Kwon
                Reporter:
                zero323 Maciej Szymkiewicz
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: