Uploaded image for project: 'Apache Cassandra'
  1. Apache Cassandra
  2. CASSANDRA-14587

Deduplicate sstables shared by multiple snapshots when computing true disk space used

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Low
    • Resolution: Unresolved
    • None
    • Tool/nodetool
    • None
    • Debian 8

      Cassandra 3.11.2

    • Docs

    Description

      Running 'nodetool listsnapshots' seems to overcount "TrueDiskSpaceUsed" under some circumstances.  Specifically when there's a large number of snapshots.  I suspect that it's not deduplicating space used when multiple snapshots share sstables that are not part of the current table.

      Results of "nodetool listsnapshots":
      Total TrueDiskSpaceUsed: 396.11 MiB
      Results of "du -hcs" on the table's directory:
      18M    total

      This is 50+ snapshots (every minute) run with "-t <datestamp> -sf --column-family <tablename> <keyspace>"
      The results of a "du -hcs -L <directory" come out pretty close to the "TrueDiskSpaceUsed"

      I have only tested against 3.11.2, but have no reason to believe it's unique to that version or even 3.x.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              Defenestrator Elliott Sims
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated: