Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-17756

java.lang.ClassCastException when using cartesian with DStream.transform

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.0.0, 2.3.0
    • 2.4.0
    • DStreams, PySpark
    • None

    Description

      Steps to reproduce:

      from pyspark.streaming import StreamingContext
      
      ssc = StreamingContext(spark.sparkContext, 10)
      (ssc
          .queueStream([sc.range(10)])
          .transform(lambda rdd: rdd.cartesian(rdd))
          .pprint())
      
      ssc.start()
      
      ## 16/10/01 21:34:30 ERROR JobScheduler: Error generating jobs for time 1475350470000 ms
      ## java.lang.ClassCastException: org.apache.spark.api.java.JavaPairRDD ## cannot be cast to org.apache.spark.api.java.JavaRDD
      ## 	at com.sun.proxy.$Proxy15.call(Unknown Source)
      ##    ....
      

      A dummy fix is to put map(lamba x: x) which suggests it is a problem similar to https://issues.apache.org/jira/browse/SPARK-16589

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            hyukjin.kwon Hyukjin Kwon
            zero323 Maciej Szymkiewicz
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment