Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-5361

Multiple Java RDD <-> Python RDD conversions not working correctly

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.2.0
    • Fix Version/s: 1.2.2, 1.3.0
    • Component/s: PySpark
    • Labels:
      None
    • Target Version/s:

      Description

      This is found through reading RDD from `sc.newAPIHadoopRDD` and writing it back using `rdd.saveAsNewAPIHadoopFile` in pyspark.

      It turns out that whenever there are multiple RDD conversions from JavaRDD to PythonRDD then back to JavaRDD, the exception below happens:

      15/01/16 10:28:31 ERROR Executor: Exception in task 0.0 in stage 3.0 (TID 7)
      java.lang.ClassCastException: [Ljava.lang.Object; cannot be cast to java.util.ArrayList
      	at org.apache.spark.api.python.SerDeUtil$$anonfun$pythonToJava$1$$anonfun$apply$1.apply(SerDeUtil.scala:157)
      	at org.apache.spark.api.python.SerDeUtil$$anonfun$pythonToJava$1$$anonfun$apply$1.apply(SerDeUtil.scala:153)
      	at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
      	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:308)
      

      The test case code below reproduces it:

      from pyspark.rdd import RDD
      
      dl = [
          (u'2', {u'director': u'David Lean'}), 
          (u'7', {u'director': u'Andrew Dominik'})
      ]
      
      dl_rdd = sc.parallelize(dl)
      tmp = dl_rdd._to_java_object_rdd()
      tmp2 = sc._jvm.SerDe.javaToPython(tmp)
      t = RDD(tmp2, sc)
      t.count()
      
      tmp = t._to_java_object_rdd()
      tmp2 = sc._jvm.SerDe.javaToPython(tmp)
      t = RDD(tmp2, sc)
      t.count() # it blows up here during the 2nd time of conversion
      

        Attachments

          Activity

            People

            • Assignee:
              wingchen Winston Chen
              Reporter:
              wingchen Winston Chen
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: