Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-17756

java.lang.ClassCastException when using cartesian with DStream.transform

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.0.0, 2.3.0
    • 2.4.0
    • DStreams, PySpark
    • None

    Description

      Steps to reproduce:

      from pyspark.streaming import StreamingContext
      
      ssc = StreamingContext(spark.sparkContext, 10)
      (ssc
          .queueStream([sc.range(10)])
          .transform(lambda rdd: rdd.cartesian(rdd))
          .pprint())
      
      ssc.start()
      
      ## 16/10/01 21:34:30 ERROR JobScheduler: Error generating jobs for time 1475350470000 ms
      ## java.lang.ClassCastException: org.apache.spark.api.java.JavaPairRDD ## cannot be cast to org.apache.spark.api.java.JavaRDD
      ## 	at com.sun.proxy.$Proxy15.call(Unknown Source)
      ##    ....
      

      A dummy fix is to put map(lamba x: x) which suggests it is a problem similar to https://issues.apache.org/jira/browse/SPARK-16589

      Attachments

        Issue Links

          Activity

            People

              gurwls223 Hyukjin Kwon
              zero323 Maciej Szymkiewicz
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: