Status: Resolved

Resolution: Fixed

Description
Currently, Impala's encryption(AESCFB + SHA256  see be/src/util/opensslutil.h) can be a bottleneck while IO is getting faster.
The throughput of AESCFB + SHA256 is about ~200~300MB/s, while nowadays' SSD throughput can be up to GB/s. for instance, the read throughput is ~2600MB/s in Intel's DC P3600, and write throughput is 1700MB/s. And the coming Intel's Optance is getting more faster.
If the customers who care about security and turn on the flag. Shuffle temp file can be a performance bottleneck. if we replace CFB+SHA256 with AESGCM, Encryption/Decryption can be ~10x faster.
Brief introduction to AESCTR & AESGCM
Confidentiality Modes: CFB & CTR
 Both are Stream Ciphers
 provablesecurity when use different nonce/IV for every message
But, CTR has its advantages:
 Hardware efficiency on an x86
 Randomaccess
 encryption/description
The CTR mode can be parallelized in instruction level(ILP), it is about 4~6 times faster than CFB on x86 platform. its implementation is welloptimized in OpenSSL or JVM on x86 platform.
"It is hard to think of any modern, bulkprivacy application scenario where any of the “original
four” blockcipher modes—ECB, CBC, CFB, or OFB—make more sense than CTR."
Phillip Rogaway
Confidentiality + Integrity
AESGCM is a relatively new standard (2008). It is a combination of CTR and GMAC. GCM has both encryption and message integrity. AESGCM was fully supported since OpenSSL 1.0.1d. Intel has added a carrylessmultiplication instruction (PCLMULQDQ) since Westmere.
 GCM is already widely used.
 provablesecurity, it is fragile only if you reuse an IV like CTR/CFB.
GCM is a very fast but arguably complex combination of CTR mode and GHASH. Luckily, we don't have to implement it. The welloptimized implementation(Prof. Shay Gueron's algorithm) with hardware acceleration(AES & PCLMULQDQ) has been adopted in OpenSSL, Linux, go language...
References:
[AESGCM for Efficient Authenticated Encryption –
Ending the Reign of HMACSHA1? ](https://crypto.stanford.edu/RealWorldCrypto/slides/gueron.pdf)
[Evaluation of Some Blockcipher Modes of Operation](http://web.cs.ucdavis.edu/~rogaway/papers/modes.pdf)
mircobenchmark
Here is the mircobenchmark on my desktop(Memory 16G, CPU: i54590 CPU @ 3.30GHz):
OpenSSL 1.0.2g, OpenSSL CTR Encryption (Total=1024MB, key=256bits, Chunk= 16KB) throughput= 3202.58MB/s. OpenSSL CTR Encryption (Total=1024MB, key=256bits, Chunk= 1MB) throughput= 3241.76MB/s. OpenSSL CTR Decryption (Total=1024MB, key=256bits, Chunk= 16KB) throughput= 3199.91MB/s. OpenSSL CTR Decryption (Total=1024MB, key=256bits, Chunk= 1MB) throughput= 3231.22MB/s. OpenSSL CFB Encryption (Total=1024MB, key=256bits, Chunk= 16KB) throughput= 427.07MB/s. OpenSSL CFB Encryption (Total=1024MB, key=256bits, Chunk= 1MB) throughput= 423.92MB/s. OpenSSL CFB Decryption (Total=1024MB, key=256bits, Chunk= 16KB) throughput= 425.87MB/s. OpenSSL CFB Decryption (Total=1024MB, key=256bits, Chunk= 1MB) throughput= 423.44MB/s. OpenSSL SHA256 Encryption (Total=64MB, key=256bits, Chunk= 64KB) throughput= 449.48MB/s. OpenSSL SHA256 Encryption (Total=1024MB, key=256bits, Chunk= 1MB) throughput= 446.63MB/s. OpenSSL GCM Encryption (Total=1024MB, key=256bits, Chunk= 16KB) throughput= 2340.80MB/s. OpenSSL GCM Encryption (Total=1024MB, key=256bits, Chunk= 1MB) throughput= 2366.55MB/s. OpenSSL CFB+SHA256 Encryption (Total=1024MB, key=256bits, Chunk= 16KB) throughput= 218.77MB/s. OpenSSL CFB+SHA256 Encryption (Total=1024MB, key=256bits, Chunk= 1MB) throughput= 220.53MB/s. OpenSSL CFB+SHA256 Decryption (Total=1024MB, key=256bits, Chunk= 16KB) throughput= 219.10MB/s. OpenSSL CFB+SHA256 Decryption (Total=1024MB, key=256bits, Chunk= 1MB) throughput= 219.92MB/s.
We can see that GCM is ~10 times faster than CFB+SHA256
Solutions
Option A: if replace CFB+SHA256 with AESGCM. Encryption/Decryption can be ~10x faster.
Option B: Just replace CFB with CTR, it is very simple, and ~70% performance gain.
folks, any comments? I will upload the patches soon.