Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
4.0.0, 3.5.1
Description
ConnectRepl uses Ammonite REPL with CodeClassWrapper to run Scala code. It means that each code cell is wrapped into a separate object. If there are multiple variables defined in the same cell / code block it will lead to capturing extra variables, increasing serialized UDF payload size or making it non-serializable.
For example, this code
// cell 1 { val x = 100 val y = new NonSerializable } // cell 2 spark.range(10).map(i => i + x).agg(sum("value")).collect()
will fail because lambda will capture both `x` and `y` as they're defined in the same wrapper object
Attachments
Issue Links
- links to