Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-19556

Broadcast data is not encrypted when I/O encryption is on

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.1.0
    • 2.2.0
    • Spark Core
    • None

    Description

      TorrentBroadcast uses a couple of "back doors" into the block manager to write and read data:

            if (!blockManager.putBytes(pieceId, bytes, MEMORY_AND_DISK_SER, tellMaster = true)) {
              throw new SparkException(s"Failed to store $pieceId of $broadcastId in local BlockManager")
            }
      
            bm.getLocalBytes(pieceId) match {
              case Some(block) =>
                blocks(pid) = block
                releaseLock(pieceId)
              case None =>
                bm.getRemoteBytes(pieceId) match {
                  case Some(b) =>
                    if (checksumEnabled) {
                      val sum = calcChecksum(b.chunks(0))
                      if (sum != checksums(pid)) {
                        throw new SparkException(s"corrupt remote block $pieceId of $broadcastId:" +
                          s" $sum != ${checksums(pid)}")
                      }
                    }
                    // We found the block from remote executors/driver's BlockManager, so put the block
                    // in this executor's BlockManager.
                    if (!bm.putBytes(pieceId, b, StorageLevel.MEMORY_AND_DISK_SER, tellMaster = true)) {
                      throw new SparkException(
                        s"Failed to store $pieceId of $broadcastId in local BlockManager")
                    }
                    blocks(pid) = b
                  case None =>
                    throw new SparkException(s"Failed to get $pieceId of $broadcastId")
                }
            }
      

      The thing these block manager methods have in common is that they bypass the encryption code; so broadcast data is stored unencrypted in the block manager, causing unencrypted data to be written to disk if those blocks need to be evicted from memory.

      The correct fix here is actually not to change TorrentBroadcast, but to fix the block manager so that:

      • data stored in memory is not encrypted
      • data written to disk is encrypted

      This would simplify the code paths that use BlockManager / SerializerManager APIs (e.g. see SPARK-19520), but requires some tricky changes inside the BlockManager to still be able to use file channels to avoid reading whole blocks back into memory so they can be decrypted.

      Attachments

        Activity

          People

            vanzin Marcelo Masiero Vanzin
            vanzin Marcelo Masiero Vanzin
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: