Details
-
Improvement
-
Status: In Progress
-
Normal
-
Resolution: Unresolved
-
None
-
Performance
-
Normal
-
All
-
None
Description
Currently in org.apache.cassandra.io.sstable.format.big.RowIndexEntry.ShallowInfoRetriever#fetchIndex we do 2 seek/read operations: 1st is to find the offset for IndexInfo and the 2nd to read it. These are two quite distant regions of the file and for standard disk access mode we do not use a benefit from a buffer in RandomAccessReader due to jumping between the regions and reseting this buffer again and again. A possible improvement here can be to read and cache N first offsets (to limit the amount of memory to use) on the first read and do later only sequential reads of IndexInfo data. By caching of less than 1Kb we can reduce the number of syscalls even more, in my case: from few hundred to less than 10.
Attachments
Issue Links
- is related to
-
CASSANDRA-19557 ShallowIndexedEntry scenario: the same IndexInfo is read multiple times, per every read row
- Resolved