Uploaded image for project: 'Sidecar for Apache Cassandra'
  1. Sidecar for Apache Cassandra
  2. CASSSIDECAR-94

Reduce filesystem calls while streaming SSTables

    XMLWordPrintableJSON

Details

    Description

      When streaming snapshotted SSTables from Cassandra Sidecar, Sidecar will perform multiple filesystem calls:

      • Traverse the data directories to determine the keyspace / table path
      • Once found determine if the SSTable file exists under the snapshots directory
      • Read the filesystem to obtain the file type and file size
      • Read the requested range of the file and stream it

      The amount of filesystem calls is manageable for streaming a single SSTable, but when a client(s) read multiple SSTables, for example in the case of Cassandra Analytics bulk reads, hundred to thousand of requests are performed requiring every request to perform the above system calls.

      In this improvement, it is proposed introducing several two to reduce the amount of system calls while streaming SSTables:

      1. Cache all data file locations: This is cached once and it will not change during the lifecycle of the application. The values come from the Storage Service MBean getAllDataFileLocations method.
      2. snapshot list cache: to maintain a cache of recently listed snapshot files under a snapshot directory. This cache avoids having to access the filesystem every time a bulk read client list the snapshot directory. This is a short lived cache and can be disabled if the snapshot list is expected to be large.

      Attachments

        Issue Links

          Activity

            People

              frankgh Francisco Guerrero
              frankgh Francisco Guerrero
              Francisco Guerrero
              Yifan Cai
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: