Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-23842

accessing java from PySpark lambda functions

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Won't Fix
    • Affects Version/s: 2.2.0, 2.2.1, 2.3.0
    • Fix Version/s: None
    • Component/s: PySpark

      Description

      Copied from https://github.com/bartdag/py4j/issues/311 but it seems to be more of a Spark issue than py4j..

      We have a third-party Java library that is distributed to Spark executors through --jars parameter.
      We want to call a static Java method in that library on executor's side through Spark's map() or create an object of that library's class through mapPartitions() call.
      None of the approaches worked so far. It seems Spark tries to serialize everything it sees in a lambda function, distribute to executors etc.
      I am aware of an older py4j issue/question #171 but looking at that discussion isn't helpful.
      We thought to create a reference to that "class" through a call like {{genmodel = spark._jvm.hex.genmodel}}and then operate through py4j to expose functionality of that library in pyspark executors' lambda functions.
      We don't want Spark to try to serialize spark session variables "spark" nor its reference to py4j gateway spark._jvm (because it leads to expected non-serializable exceptions), so tried to "trick" Spark not to try to serialize those by nested the above genmodel = spark._jvm.hex.genmodel into exec() call.
      It led to another issue that spark (spark session) nor sc (spark context) variables seems not available in spark executors' lambda functions. So we're stuck and don't know how to call a generic java class through py4j on executor's side (from within map or mapPartitions lambda functions).
      It would be an easier adventure from Scala/ Java for Spark as those can directly call that 3rd-party libraries methods, but our users ask to have a way to do the same from PySpark.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              Tagar Ruslan Dautkhanov
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: