Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-45136

Improve ClosureCleaner to support closures defined in Ammonite REPL

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 4.0.0, 3.5.1
    • 4.0.0
    • Connect

    Description

      ConnectRepl uses Ammonite REPL with  CodeClassWrapper to run Scala code. It means that each code cell is wrapped into a separate object. If there are multiple variables defined in the same cell / code block it will lead to capturing extra variables, increasing serialized UDF payload size or making it non-serializable.

      For example, this code

      // cell 1 
      {
        val x = 100
        val y = new NonSerializable
      }
      
      // cell 2
      spark.range(10).map(i => i + x).agg(sum("value")).collect()

      will fail because lambda will capture both `x` and `y` as they're defined in the same wrapper object

      Attachments

        Issue Links

          Activity

            People

              vsevolod.stepanov Vsevolod Stepanov
              vsevolod.stepanov Vsevolod Stepanov
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: