Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-39983

Should not cache unserialized broadcast relations on the driver

    XMLWordPrintableJSON

Details

    Description

      In TorrentBroadcast.writeBlocks we store the unserialized broadcast object in addition to the serialized version of it - 

      private def writeBlocks(value: T): Int = {
          import StorageLevel._
          // Store a copy of the broadcast variable in the driver so that tasks run on the driver
          // do not create a duplicate copy of the broadcast variable's value.
          val blockManager = SparkEnv.get.blockManager
          if (!blockManager.putSingle(broadcastId, value, MEMORY_AND_DISK, tellMaster = false)) {
            throw new SparkException(s"Failed to store $broadcastId in BlockManager")
          }
       

      In case of broadcast relations, these objects can be fairly large (60MB in one observed case) and are not strictly necessary on the driver.

      Add the option to not keep the unserialized versions of the objects.

       

      Attachments

        Activity

          People

            alex-balikov Alex Balikov
            alex-balikov Alex Balikov
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: