Uploaded image for project: 'Apache Cassandra'
  1. Apache Cassandra
  2. CASSANDRA-16335

Expose data dirs in ColumnFamilyStoreMBean

Agile BoardAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Low
    • Resolution: Fixed
    • 4.0-rc1, 4.0
    • Local/Config
    • None

    Description

      As of now, I am not currently aware of any way how to get the information where a CF stores its data. While this might look like a detail, it is important for backup and restore purposes. Lets consider this workflow:

      1) There is a keyspace "abc" with table "def", on disk, it will look like /my/data/abc/def-123445/...

      2) I take a backup, all SSTables are put somewhere under path /backups/abc/def-12345/....

      3) I delete this table by CQL, data ends up in "dropped"

      4) I create this table again, but now it will generate other ID - like /my/data/abc/def-6789/...

      5) I want to restore /backups/abc/def-123445/... but right now there are two structures - 

      ├── data
      │   ├── abc
      │   │   ├── def-12345...
      │   │   │   ├── backups
      │   │   │   └── snapshots
      │   │   │       └── dropped-1607699318139-ghi
      │   │   │           ├── manifest.json
      │   │   │           ├── na-1-big-CompressionInfo.db
      │   │   │           ├── na-1-big-Data.db
      │   │   │           ├── na-1-big-Digest.crc32
      │   │   │           ├── na-1-big-Filter.db
      │   │   │           ├── na-1-big-Index.db
      │   │   │           ├── na-1-big-Statistics.db
      │   │   │           ├── na-1-big-Summary.db
      │   │   │           ├── na-1-big-TOC.txt
      │   │   │           └── schema.cql
      │   │   └── def-6789...
      │   │       ├── backups
      │   │       ├── na-1-big-CompressionInfo.db
      │   │       ├── na-1-big-Data.db
      │   │       ├── na-1-big-Digest.crc32
      │   │       ├── na-1-big-Filter.db
      │   │       ├── na-1-big-Index.db
      │   │       ├── na-1-big-Statistics.db
      │   │       ├── na-1-big-Summary.db
      │   │       └── na-1-big-TOC.txt
      

      The question now is, what directory I should restore this to? Sure, into the "active" one, but I can not possibly know which one it is, because one of them is not used anymore and I do not want to do something very smelly like listing directories on disk and checking which one does not contain "dropped" directory ... Yes, one might use importing of SSTables - that is introduced in Cassandra 4, but for Cassandra 3, one can either copy it over or do hardlinks and refresh.

      The second scenario is like this:

      There is just one "active" table, no structure with "dropped" dir exists, but its id (that part after table name) differs. If I want to copy files over and refresh, I need to resolve this discrepancy and copy SSTables into a directory ending on id which differs from id from backup.

      I was trying to get this information from CFSMB but that information is not exposed.

      Is there any way how to retrieve via JMX where a table actually stores its data?

      I have put this together: https://github.com/apache/cassandra/pull/850/files

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            stefan.miklosovic Stefan Miklosovic Assign to me
            stefan.miklosovic Stefan Miklosovic
            Stefan Miklosovic
            Marcus Eriksson
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 0.5h
                0.5h

                Slack

                  Issue deployment