Description
In looking at performance of a workload which creates a lot of short-lived remote connections to a secured DN, philip and I found very high system CPU usage. We tracked it down to reads from /dev/random, which are incurred by the DN using CryptoCodec.generateSecureRandom to generate a transient session key and IV for AES encryption.
In the case that the OpenSSL codec is not enabled, the above code falls through to the JDK SecureRandom implementation, which performs reasonably. However, OpenSSLCodec defaults to using OsSecureRandom, which reads all random data from /dev/random rather than doing something more efficient like initializing a CSPRNG from a small seed.
I wrote a simple JMH benchmark to compare various approaches when running with concurrency 10:
testHadoop - using CryptoCodec
testNewSecureRandom - using 'new SecureRandom()' each iteration
testSha1PrngNew - using the SHA1PRNG explicitly, new instance each iteration
testSha1PrngShared - using a single shared instance of SHA1PRNG
testSha1PrngThread - using a thread-specific instance of SHA1PRNG
Benchmark Mode Cnt Score Error Units MyBenchmark.testHadoop thrpt 1293.000 ops/s [with libhadoop.so] MyBenchmark.testHadoop thrpt 461515.697 ops/s [without libhadoop.so] MyBenchmark.testNewSecureRandom thrpt 43413.640 ops/s MyBenchmark.testSha1PrngNew thrpt 395515.000 ops/s MyBenchmark.testSha1PrngShared thrpt 164488.713 ops/s MyBenchmark.testSha1PrngThread thrpt 4295123.210 ops/s
In other words, the presence of the OpenSSL acceleration slows down this code path by 356x. And, compared to the optimal (thread-local Sha1Prng) it's 3321x slower.