Details
-
Improvement
-
Status: Resolved
-
Low
-
Resolution: Fixed
-
None
-
Operability
-
Normal
-
All
-
None
-
Description
As of now, I am not currently aware of any way how to get the information where a CF stores its data. While this might look like a detail, it is important for backup and restore purposes. Lets consider this workflow:
1) There is a keyspace "abc" with table "def", on disk, it will look like /my/data/abc/def-123445/...
2) I take a backup, all SSTables are put somewhere under path /backups/abc/def-12345/....
3) I delete this table by CQL, data ends up in "dropped"
4) I create this table again, but now it will generate other ID - like /my/data/abc/def-6789/...
5) I want to restore /backups/abc/def-123445/... but right now there are two structures -
├── data │ ├── abc │ │ ├── def-12345... │ │ │ ├── backups │ │ │ └── snapshots │ │ │ └── dropped-1607699318139-ghi │ │ │ ├── manifest.json │ │ │ ├── na-1-big-CompressionInfo.db │ │ │ ├── na-1-big-Data.db │ │ │ ├── na-1-big-Digest.crc32 │ │ │ ├── na-1-big-Filter.db │ │ │ ├── na-1-big-Index.db │ │ │ ├── na-1-big-Statistics.db │ │ │ ├── na-1-big-Summary.db │ │ │ ├── na-1-big-TOC.txt │ │ │ └── schema.cql │ │ └── def-6789... │ │ ├── backups │ │ ├── na-1-big-CompressionInfo.db │ │ ├── na-1-big-Data.db │ │ ├── na-1-big-Digest.crc32 │ │ ├── na-1-big-Filter.db │ │ ├── na-1-big-Index.db │ │ ├── na-1-big-Statistics.db │ │ ├── na-1-big-Summary.db │ │ └── na-1-big-TOC.txt
The question now is, what directory I should restore this to? Sure, into the "active" one, but I can not possibly know which one it is, because one of them is not used anymore and I do not want to do something very smelly like listing directories on disk and checking which one does not contain "dropped" directory ... Yes, one might use importing of SSTables - that is introduced in Cassandra 4, but for Cassandra 3, one can either copy it over or do hardlinks and refresh.
The second scenario is like this:
There is just one "active" table, no structure with "dropped" dir exists, but its id (that part after table name) differs. If I want to copy files over and refresh, I need to resolve this discrepancy and copy SSTables into a directory ending on id which differs from id from backup.
I was trying to get this information from CFSMB but that information is not exposed.
Is there any way how to retrieve via JMX where a table actually stores its data?
I have put this together: https://github.com/apache/cassandra/pull/850/files