Details
-
Improvement
-
Status: Resolved
-
Normal
-
Resolution: Fixed
Description
When streaming snapshotted SSTables from Cassandra Sidecar, Sidecar will perform multiple filesystem calls:
- Traverse the data directories to determine the keyspace / table path
- Once found determine if the SSTable file exists under the snapshots directory
- Read the filesystem to obtain the file type and file size
- Read the requested range of the file and stream it
The amount of filesystem calls is manageable for streaming a single SSTable, but when a client(s) read multiple SSTables, for example in the case of Cassandra Analytics bulk reads, hundred to thousand of requests are performed requiring every request to perform the above system calls.
In this improvement, it is proposed introducing several two to reduce the amount of system calls while streaming SSTables:
1. Cache all data file locations: This is cached once and it will not change during the lifecycle of the application. The values come from the Storage Service MBean getAllDataFileLocations method.
2. snapshot list cache: to maintain a cache of recently listed snapshot files under a snapshot directory. This cache avoids having to access the filesystem every time a bulk read client list the snapshot directory. This is a short lived cache and can be disabled if the snapshot list is expected to be large.
Attachments
Issue Links
- relates to
-
CASSANDRA-18111 Centralize all snapshot operations to SnapshotManager and cache snapshots
- Resolved
- links to