Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
Description
For example, this does not work.
val Rc = sc.broadcast(R)
var msb = sc.broadcast(ms)
var usb = sc.broadcast(us)
ms = sc.parallelize(0 until M, slices)
.map((i : Int) =>update(i, msb.value, usb.value, Rc.value))
.collect()
This results in the following exception.
org.apache.commons.lang.SerializationException: java.io.NotSerializableException: edu.snu.nemo.compiler.frontend.spark.core.SparkContext
It seems that the 'sc' gets somehow included in the 'map' closure, when using the broadcast variables: Rc/msb/usb.
Interestingly, the following code with the closures declared prior to 'map' works.
val Rc = sc.broadcast(R)
var msb = sc.broadcast(ms)
var usb = sc.broadcast(us)
val update_ms = (i : Int) => update(i, msb.value, usb.value, Rc.value)
val update_us = (i : Int) => update(i, usb.value, msb.value, Rc.value.transpose())
println(s"Iteration $iter:")
ms = sc.parallelize(0 until M, slices)
.map(update_ms)
.collect()
We should find the root cause and fix the problem, in order to support existing RDD programs that are most likely not written in this way.