Uploaded image for project: 'Apache Nemo'
  1. Apache Nemo
  2. NEMO-205

RDD Closure with Broadcast Variables Serialization Bug

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None

    Description

      For example, this does not work.

      val Rc = sc.broadcast(R)
      var msb = sc.broadcast(ms)
      var usb = sc.broadcast(us)

      ms = sc.parallelize(0 until M, slices)
      .map((i : Int) =>update(i, msb.value, usb.value, Rc.value))
      .collect()

      This results in the following exception.

      org.apache.commons.lang.SerializationException: java.io.NotSerializableException: edu.snu.nemo.compiler.frontend.spark.core.SparkContext

      It seems that the 'sc' gets somehow included in the 'map' closure, when using the broadcast variables: Rc/msb/usb.

       

      Interestingly, the following code with the closures declared prior to 'map' works. 

      val Rc = sc.broadcast(R)
      var msb = sc.broadcast(ms)
      var usb = sc.broadcast(us)

      val update_ms = (i : Int) => update(i, msb.value, usb.value, Rc.value)
      val update_us = (i : Int) => update(i, usb.value, msb.value, Rc.value.transpose())

      println(s"Iteration $iter:")
      ms = sc.parallelize(0 until M, slices)
      .map(update_ms)
      .collect()

       

      We should find the root cause and fix the problem, in order to support existing RDD programs that are most likely not written in this way.

      Attachments

        Activity

          People

            Unassigned Unassigned
            johnyangk John Yang
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: