Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-44588

Migrated shuffle blocks are encrypted multiple times when io.encryption is enabled

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 3.1.0, 3.1.1, 3.1.2, 3.2.0, 3.1.3, 3.2.1, 3.3.0, 3.2.2, 3.3.1, 3.2.3, 3.2.4, 3.3.2, 3.4.0, 3.4.1
    • 3.3.3, 3.4.2, 3.5.0
    • Spark Core
    • None

    Description

      Shuffle blocks upon migration are wrapped for encryption again when being written out to a file on the receiver side.

       

      Pull request to fix this: https://github.com/apache/spark/pull/42214

       

      Details:

      Sender/Read side:

      BlockManagerDecommissioner:run()
          blocks = bm.migratableResolver.getMigrationBlocks()
              dataFile = IndexShuffleBlockResolver:getDataFile(...)
             buffer = FileSegmentManagedBuffer(..., dataFile)
                             ^ This reads straight from disk without decryption
          blocks.foreach((blockId, buffer) => bm.blockTransferService.uploadBlockSync(..., buffer, ...))
              -> uploadBlockSync() -> uploadBlock(..., buffer, ...)
                  -> client.uploadStream(UploadBlockStream, buffer, ...)

      • Notice that there is no decryption here on the sender/read side.

      Receiver/Write side:

      NettyBlockRpcServer:receiveStream() <--- This is the UploadBlockStream handler
          putBlockDataAsStream()
              migratableResolver.putShuffleBlockAsStream()
                  -> file = IndexShuffleBlockResolver:getDataFile(...)
                  -> tmpFile = (file + .<uuid> extension)
                  -> Creates an encrypting writable channel to a tmpFile using serializerManager.wrapStream()
                  -> onData() writes the data into the channel
                  -> onComplete() renames the tmpFile to the file

      • Notice:
      • Both getMigrationBlocks()[read] and putShuffleBlockAsStream()[write] target IndexShuffleBlockResolver:getDataFile()
      • The read path does not decrypt but the write path encrypts.
      • As a thought exercise: if this cycle happens more than once (where this receiver is now a sender) even if we assume that the shuffle blocks are initially unencrypted*, then bytes in the file will just have more and more layers of encryption applied to it each time it gets migrated.
      • *In practice, the shuffle blocks are encrypted on disk to begin with, this is just a thought exercise

      Attachments

        Issue Links

          Activity

            People

              henrymai Henry Mai
              henrymai Henry Mai
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: