Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-6128

Spill-to-disk Encryption(AES-CFB + SHA256) can be a performance bottleneck while IO is getting faster

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • Impala 2.12.0
    • Backend

    Description

      Currently, Impala's encryption(AES-CFB + SHA256 - see be/src/util/openssl-util.h) can be a bottleneck while IO is getting faster.
      The throughput of AES-CFB + SHA256 is about ~200~300MB/s, while nowadays' SSD throughput can be up to GB/s. for instance, the read throughput is ~2600MB/s in Intel's DC P3600, and write throughput is 1700MB/s. And the coming Intel's Optance is getting more faster.
      If the customers who care about security and turn on the flag. Shuffle temp file can be a performance bottleneck. if we replace CFB+SHA256 with AES-GCM, Encryption/Decryption can be ~10x faster.

      Brief introduction to AES-CTR & AES-GCM

      Confidentiality Modes: CFB & CTR

      • Both are Stream Ciphers
      • provable-security when use different nonce/IV for every message

      But, CTR has its advantages:

      • Hardware efficiency on an x86
      • Random-access
      • encryption/description
        The CTR mode can be parallelized in instruction level(ILP), it is about 4~6 times faster than CFB on x86 platform. its implementation is well-optimized in OpenSSL or JVM on x86 platform.

      "It is hard to think of any modern, bulk-privacy application scenario where any of the “original
      four” blockcipher modes—ECB, CBC, CFB, or OFB—make more sense than CTR."
      --Phillip Rogaway

      Confidentiality + Integrity
      AES-GCM is a relatively new standard (2008). It is a combination of CTR and GMAC. GCM has both encryption and message integrity. AES-GCM was fully supported since OpenSSL 1.0.1d. Intel has added a carry-less-multiplication instruction (PCLMULQDQ) since Westmere.

      • GCM is already widely used.
      • provable-security, it is fragile only if you re-use an IV like CTR/CFB.

      GCM is a very fast but arguably complex combination of CTR mode and GHASH. Luckily, we don't have to implement it. The well-optimized implementation(Prof. Shay Gueron's algorithm) with hardware acceleration(AES & PCLMULQDQ) has been adopted in OpenSSL, Linux, go language...

      References:
      [AES-GCM for Efficient Authenticated Encryption –
      Ending the Reign of HMAC-SHA-1? ](https://crypto.stanford.edu/RealWorldCrypto/slides/gueron.pdf)
      [Evaluation of Some Blockcipher Modes of Operation](http://web.cs.ucdavis.edu/~rogaway/papers/modes.pdf)

      mirco-benchmark

      Here is the mirco-benchmark on my desktop(Memory 16G, CPU: i5-4590 CPU @ 3.30GHz):

      OpenSSL 1.0.2g,
      OpenSSL CTR Encryption (Total=1024MB, key=256bits, Chunk= 16KB)	throughput= 3202.58MB/s.
      OpenSSL CTR Encryption (Total=1024MB, key=256bits, Chunk=  1MB)	throughput= 3241.76MB/s.
      OpenSSL CTR Decryption (Total=1024MB, key=256bits, Chunk= 16KB)	throughput= 3199.91MB/s.
      OpenSSL CTR Decryption (Total=1024MB, key=256bits, Chunk=  1MB)	throughput= 3231.22MB/s.
      
      OpenSSL CFB Encryption (Total=1024MB, key=256bits, Chunk= 16KB)	throughput= 427.07MB/s.
      OpenSSL CFB Encryption (Total=1024MB, key=256bits, Chunk=  1MB)	throughput= 423.92MB/s.
      OpenSSL CFB Decryption (Total=1024MB, key=256bits, Chunk= 16KB)	throughput= 425.87MB/s.
      OpenSSL CFB Decryption (Total=1024MB, key=256bits, Chunk=  1MB)	throughput= 423.44MB/s.
      
      OpenSSL SHA256 Encryption (Total=64MB, key=256bits, Chunk= 64KB)	throughput= 449.48MB/s.
      OpenSSL SHA256 Encryption (Total=1024MB, key=256bits, Chunk=  1MB)	throughput= 446.63MB/s.
      
      OpenSSL GCM Encryption (Total=1024MB, key=256bits, Chunk= 16KB)	throughput= 2340.80MB/s.
      OpenSSL GCM Encryption (Total=1024MB, key=256bits, Chunk=  1MB)	throughput= 2366.55MB/s.
      
      OpenSSL CFB+SHA256 Encryption (Total=1024MB, key=256bits, Chunk= 16KB)	throughput= 218.77MB/s.
      OpenSSL CFB+SHA256 Encryption (Total=1024MB, key=256bits, Chunk=  1MB)	throughput= 220.53MB/s.
      OpenSSL CFB+SHA256 Decryption (Total=1024MB, key=256bits, Chunk= 16KB)	throughput= 219.10MB/s.
      OpenSSL CFB+SHA256 Decryption (Total=1024MB, key=256bits, Chunk=  1MB)	throughput= 219.92MB/s.
      

      We can see that GCM is ~10 times faster than CFB+SHA256

      Solutions

      Option A: if replace CFB+SHA256 with AES-GCM. Encryption/Decryption can be ~10x faster.
      Option B: Just replace CFB with CTR, it is very simple, and ~70% performance gain.

      folks, any comments? I will upload the patches soon.

      Attachments

        Activity

          People

            Unassigned Unassigned
            kexianda Xianda Ke
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: