Attached patches for this at https://github.com/pcmanus/cassandra/commits/3237-1.
This ain't small so I'll try to explain the main idea here.
The main idea is that internally, super column families are handled for almost all intents and purposes as if their comparator was a simple CompositeType with 2 components: the 1st one is the old super column name, the 2nd one the old sub-column name. Meaning that they are largely not a special anymore and all the super column specific code go away (including SuperColumn.java).
Now for compatibility sake, the main action is in the new SuperColumns.java class. This class contains a bunch of static methods that:
- deserialize old super column format directly into new composite based CF.
- serialize new composite based CF to the old super column format
- convert 'super column query filters' to and from 'composite based query filters'.
Then in ColumnFamilySerializer and the ReadCommand serializer, we use those static methods when talking to old nodes (and a super column family is involved). We also convert thrift SC queries into equivalent ones on the new composite format in CassandraServer.java.
The patch also don't shy away from removing abstractions that are not necessary anymore once super columns are removed. Most notably:
- QueryPath is removed. It was honestly already kind of useless with super columns but even more so without them. It was also error-prone imho because some method that were taking a QueryPath were actually ignoring everything except the columnFamilyName for instance. I note that the class itself is not removed but kept only to simplify wire compatibility with old nodes.
- IColumn and IColumnContainer are removed.
We could also merge ColumnFamily and AbstractColumnContainer but I've left that to later.
As far as testing goes:
- the unit tests pass more or less. There's CassandraServerTest that timeout on my box, but it does so on trunk too (seems to be the JVM that don't exit properly). And there's also a few serializationTest failing but it seems to be more related to the fact that the patch bumps the messaging version up that anything else. I'll look at that later.
- our old functional tests (in test/system) pass. Again, there is a few failure, but those are test that are assuming CollatingOrderedPartitioner (apparently nobody ran those tests in a while). Anyway, those tests test the thrift API for super columns fairly thorougly.
- you can now access super column family from CQL3.
- I've also (briefly) tested wire compatibily and that you can do super columns queries in a mixed version cluster.
Regarding the CQL3 support, SCF for which column_metadata has been defined on the subcolumn are handled almost like sparse CF. The almost is because I've made sure we don't write row marker as in the case of sparse CF, cause that would break backward compatibility (there is no way to have a column with an empty name in a super column). For the same reason, collection are not supported either.
One small downside that I need to note is that during upgrade from 1.2 to 2.0, there might be a noticeable latency increase in super column queries. The reason is that any read query that mix pre and post SC nodes will have a digest mismatch (and so will re-query with the full data). Indeed, digest are not versioned and cannot really be (not easily at least).