We planed to bump base rocksDB version from 5.17.2 to 6.11.x. However, we observed performance regression compared with 5.17.2 and 5.18.3 via our own flink-benchmarks, and reported to RocksDB community in rocksdb#5774. Since rocksDB-5.18.3 is a bit old for RocksDB community, and rocksDB built-in db_bench tool cannot easily reproduce this regression, we did not get any efficient help from RocksDB community.
Since code freeze of Flink-release-1.12 is close, we have to figure it out by ourself. We try to use rocksDB built-in db_bench tool first to binary searching the 160 different commits between rocksDB 5.17.2 and 5.18.3. However, the performance regression is not so clear. And after using our own flink-benchmarks. We finally detect the commit which introduced the nearly-10% performance regression: replaced __thread with thread_local keyword .
From existing knowledge, the performance regression of thread-local is known from gcc-4.8 changes and become more serious in dynamic modules usage [tls benchmark]. That could explain why rocksDB built-in db_bench tool cannot reproduce this regression as it is complied in static mode by recommendation.
We plan to fix this in our FRocksDB branch first to revert related changes. And from my current local experimental result, that revert proved to be effective to avoid that performance regression.