Cassandra
  1. Cassandra
  2. CASSANDRA-1983

Make sstable filenames contain a UUID instead of increasing integer

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Won't Fix
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      sstable filenames look like CFName-1569-Index.db, containing an integer for uniqueness. This makes it possible (however unlikely) that the integer could overflow, which could be a problem. It also makes it difficult to collapse multiple nodes into a single one with rsync. I do this occasionally for testing: I'll copy our 20 node cluster into only 3 nodes by copying all of the data files and running cleanup; at present this requires a manual step of uniqifying the overlapping sstable names. If instead of an incrementing integer, it would be handy if these contained a UUID or somesuch that guarantees uniqueness across the cluster.

        Activity

        Hide
        Ryan King added a comment -

        Alternatively, since we'll need a host->uuid mapping for counters we can put that uuid in the filename along with a serial integer (make it a long and we should be ok, right?)

        Show
        Ryan King added a comment - Alternatively, since we'll need a host->uuid mapping for counters we can put that uuid in the filename along with a serial integer (make it a long and we should be ok, right?)
        Hide
        David King added a comment -

        As long as the host is still willing to read filenames without its own uuid, sure

        Show
        David King added a comment - As long as the host is still willing to read filenames without its own uuid, sure
        Hide
        Robert Coli added a comment -

        +1 here, global uniqueness for sstable names would make many copy-the-sstables style maint operations easier, as you wouldn't have to manually resolve the namespace conflict. just now I saw someone in #cassandra who was setting up a cluster with a copy of data get confused by non-unique filenames being overwritten on his new cluster. the only downside seems to be longer sstable file names.

        Show
        Robert Coli added a comment - +1 here, global uniqueness for sstable names would make many copy-the-sstables style maint operations easier, as you wouldn't have to manually resolve the namespace conflict. just now I saw someone in #cassandra who was setting up a cluster with a copy of data get confused by non-unique filenames being overwritten on his new cluster. the only downside seems to be longer sstable file names.
        Hide
        Daniel Shelepov added a comment -

        Is this still needed? Naming in 2.0+ is still incremental as far as I can tell.

        I'd like to work on this fix while I'm learning the codebase.

        Show
        Daniel Shelepov added a comment - Is this still needed? Naming in 2.0+ is still incremental as far as I can tell. I'd like to work on this fix while I'm learning the codebase.
        Hide
        Jonathan Ellis added a comment -

        Yes.

        Show
        Jonathan Ellis added a comment - Yes.
        Hide
        Daniel Shelepov added a comment -

        Notes so far:

        • sstable filenames are controlled by the io/sstable/Descriptor class, which encapsulates a few parameters including "generation" – the increasing integer in question.
        • dropping generation in favor of a uuid seems questionable, given that generation is used by a wide variety of clients in the codebase. So the most likely approach is uuid + generation side by side.
        • using the host id as the uuid is easy conceptually, but will violate layering, because code in io will start to depend on db and/or service. Plus there is potential bootstrapping problem where system sstables need to be initialized early on during boot, and it's not clear whether the unique host id is available early enough to feed into system sstable descriptors.
        • random uuids are also tricky, because sstable names will no longer be discoverable without directory lookups. Some code (particularly in unit tests) leans on the ability to synthesize sstable names without touching the filesystem. It's possible to persist these uuids in one of the system tables, but it will have to be a local table, and, regardless, changing system schema can make this a breaking change.

        I haven't yet found a cost-effective fix that would involve actually modifying the existing naming scheme.

        The latest idea I have is to create a directory that will hold symlinks to real sstables (symlinks are available in Java 7). Symlink names will contain the UUIDs. The only extra piece of code would be creating and tearing down symlinks when real sstables are created and deleted. End users could then access sstables through this symlink directory whenever doing related maintenance. The last piece would be making sure that appropriate clients, such as the compactor, can consume sstables with and without UUIDs.

        I'll work on this some more tomorrow, but it'll probably spill until next week (or later).

        Comments welcome.

        Show
        Daniel Shelepov added a comment - Notes so far: sstable filenames are controlled by the io/sstable/Descriptor class, which encapsulates a few parameters including "generation" – the increasing integer in question. dropping generation in favor of a uuid seems questionable, given that generation is used by a wide variety of clients in the codebase. So the most likely approach is uuid + generation side by side. using the host id as the uuid is easy conceptually, but will violate layering, because code in io will start to depend on db and/or service. Plus there is potential bootstrapping problem where system sstables need to be initialized early on during boot, and it's not clear whether the unique host id is available early enough to feed into system sstable descriptors. random uuids are also tricky, because sstable names will no longer be discoverable without directory lookups. Some code (particularly in unit tests) leans on the ability to synthesize sstable names without touching the filesystem. It's possible to persist these uuids in one of the system tables, but it will have to be a local table, and, regardless, changing system schema can make this a breaking change. I haven't yet found a cost-effective fix that would involve actually modifying the existing naming scheme. The latest idea I have is to create a directory that will hold symlinks to real sstables (symlinks are available in Java 7). Symlink names will contain the UUIDs. The only extra piece of code would be creating and tearing down symlinks when real sstables are created and deleted. End users could then access sstables through this symlink directory whenever doing related maintenance. The last piece would be making sure that appropriate clients, such as the compactor, can consume sstables with and without UUIDs. I'll work on this some more tomorrow, but it'll probably spill until next week (or later). Comments welcome.
        Hide
        Jonathan Ellis added a comment -

        "The only extra piece of code would be creating and tearing down symlinks when real sstables are created and deleted."

        That doesn't really buy us anything for the "I want to merge in some sstables from an external source" problem though; just changes the constraint from "distinct filenames" to "distinct symlink names."

        Created CASSANDRA-6719 to supersede this.

        Show
        Jonathan Ellis added a comment - "The only extra piece of code would be creating and tearing down symlinks when real sstables are created and deleted." That doesn't really buy us anything for the "I want to merge in some sstables from an external source" problem though; just changes the constraint from "distinct filenames" to "distinct symlink names." Created CASSANDRA-6719 to supersede this.
        Hide
        Daniel Shelepov added a comment - - edited

        OK, but symlinks are much easier to make unique, because they won't affect all that code that expects to find sstables under well-known names (regular names still being available in regular sstable storage). The fact that they're symlinks allows decoupling the problem from internal naming requirements.

        Show
        Daniel Shelepov added a comment - - edited OK, but symlinks are much easier to make unique, because they won't affect all that code that expects to find sstables under well-known names (regular names still being available in regular sstable storage). The fact that they're symlinks allows decoupling the problem from internal naming requirements.

          People

          • Assignee:
            Unassigned
            Reporter:
            David King
          • Votes:
            3 Vote for this issue
            Watchers:
            10 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development