Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-12834

Use type conversion instead of Ser/De of Pickle to transform JavaArray and JavaList

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.6.1, 2.0.0
    • Component/s: PySpark
    • Labels:
      None

      Description

      According to the Ser/De code in Python side:

      StringIndexerModel
        def _java2py(sc, r, encoding="bytes"):
          if isinstance(r, JavaObject):
              clsName = r.getClass().getSimpleName()
              # convert RDD into JavaRDD
              if clsName != 'JavaRDD' and clsName.endswith("RDD"):
                  r = r.toJavaRDD()
                  clsName = 'JavaRDD'
      
              if clsName == 'JavaRDD':
                  jrdd = sc._jvm.SerDe.javaToPython(r)
                  return RDD(jrdd, sc)
      
              if clsName == 'DataFrame':
                  return DataFrame(r, SQLContext.getOrCreate(sc))
      
              if clsName in _picklable_classes:
                  r = sc._jvm.SerDe.dumps(r)
              elif isinstance(r, (JavaArray, JavaList)):
                  try:
                      r = sc._jvm.SerDe.dumps(r)
                  except Py4JJavaError:
                      pass  # not pickable
      
          if isinstance(r, (bytearray, bytes)):
              r = PickleSerializer().loads(bytes(r), encoding=encoding)
          return r
      

      We use SerDe.dumps to serialize JavaArray and JavaList in PythonMLLibAPI, then deserialize them with PickleSerializer in Python side. However, there is no need to transform them in such an inefficient way. Instead of it, we can use type conversion to convert them, e.g. list(JavaArray) or list(JavaList). What's more, there is an issue to Ser/De Scala Array as I said in https://issues.apache.org/jira/browse/SPARK-12780

        Attachments

          Activity

            People

            • Assignee:
              yinxusen Xusen Yin
              Reporter:
              yinxusen Xusen Yin
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: