In the patch of
CASSANDRA-11206 , I found that FileIndexInfoRetriever allocates a (potentially very large) IndexInfo array (up to the number of IndexInfo in the RowIndexEntry has) as a cache in every single read path.
After some experiments using LargePartitionTest on my MacBook, I got results that show that removing FileIndexInfoRetriever improves the performance for large partitions like below (latencies reduced by 41% and by 45%).
Code is here (based on trunk)
Heap memory usage during running LargePartitionsTest (except for 8G test) with array cache(original)
Heap memory usage during running LargePartitionsTest (except for 8G test) without cache
Of course, I have attempted to use some collection containers instead of a plain array. But I could not recognize great improvement enough to justify using these cache mechanism by them. (Unless I did some mistake or overlook about this test)
|LargePartitionsTest.test_12_2G||SELECTs 1 (ms)||SELECTs 2 (ms)||Scan (ms)|
|ConcurrentHashMap 2nd trial||44036||26895||17443|
|LinkedHashCache (capacity=16, limit=10, fifo) 1st||42668||32165||17323|
|LinkedHashCache (capacity=16, limit=10, fifo) 2nd||48863||28066||18053|
|LinkedHashCache (capacity=16, limit=16, fifo)||46979||29810||18620|
|LinkedHashCache (capacity=16, limit=10, lru)||46456||29749||20311|
|No Cache 2nd trial||46534||27670||18700|
Code that I used for this comparison is here. LinkedHashCache is a simple fifo/lru cache that is extended by LinkedHashMap.
Scan is a execution time to iterate through the large partition.
So, in this issue, I'd like to propose to remove IndexInfo cache from FileIndexInfoRetriever to improve the performance on large partitions.