Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-14229

Separate data drive for Index.db files


    • Type: New Feature
    • Status: Open
    • Priority: Low
    • Resolution: Unresolved
    • Fix Version/s: None
    • Labels:


      For datasets with an active set of keys that well exceeds ram, it would be quite useful to be able to put certain sstable files (e.g. *-Index.db) on a separate, faster drive(s) than the data. E.g. put the indexes on SSD and data on HDD. Particularly valuable when keys are much smaller than values. Also as ram continues to get more expensive, users that currently optimize by having large key caches may not need to buy as much of it.

      Our use case is a large dataset like this one. Storing all the data on SSD is cost-prohibitive, and the reads are extremely random (effectively every key is in the active set), so we don't have enough ram to cache it. (I did try using a massive key cache, 64GB, and was seeing strange behavior anyway... irqbalancer process pegged the cpu and the whole thing way underperformed. An investigation for another day.)

      At the moment our only resolution is to buy enough HDD to handle 2 seeks per read, 1 for the index and 1 for the data. But having indexes on SSD would speed this way up, and practically require us to purchase a small number of SSDs and about 1/2 the number of HDD.

      One user suggested lvmcache, which could work. I'd like to hear if this will really work optimally and if lvmcache will really keep the right blocks on the faster volume, and how reliable it is at the task.

      Note: asked about this on the mailing list and it was suggested I create a JIRA.




            • Assignee:
              dkinder Dan Kinder
            • Votes:
              0 Vote for this issue
              5 Start watching this issue


              • Created: