In the ctor, for isDense, we seem to force re-computation for non-SC non-dense tables (which happens to include non-compact tables).
Corrected it. This didn't cause any trouble (because was checking for an empty name, too), but now we'll recompute only for the supercolumn ones, which is much better.
But for I'm also not 100% sure this work in all upgrade scenario.
Thing is the re-computation is necessary only for the upgrade from 2.x -> current 3.x -> 3.x + this patch, in which case we have two clustering columns instead of one just a single "empty" value column, which is a collection. 3.x-created supercolumns are calculated correctly here.
You are right that some testing in that regard won't hurt, so I've added some 3-step upgrade tests with the scenarios similar to the current one. For the sakes of completeness, I've also added a test for current 3.x -created supercolumn families.
Unfortunately, while we were supposed to have a SC upgrade dtests
I've added some dtests as well as upgrade tests.
I'm not entirely sure if/how column renaming works for the SC key and value "fake" column
You're right: I did leave it off as I thought read/write path is the only support that's required for migrating off SCF, but adding this is trivial. I've added tests for all sorts of renames. However, there is no special-casing done in AlterStatement: getColumnDefinition would return the "fake" columns as well. removeColumn would be a no-op in case of fake columns and adding column would force re-initialisation and the new column would get picked up correctly.
In UpdateParameters#addCounter, not sure the change from CounterContext#createUpdate to CounterContext#createLocal is corret.
Yes, sorry, I have overlooked it.
In getSuperCfKeyColumn, not sure what "3.x-created supercolumn family" refers to (is that SC tables created from thrift in 3.x? or SC tables whose schema had been upgraded by a previous 3.x version?) and more importantly, why it would differ in number of clustering columns?
You are right, SC columns created from Thrift in 3.x. When upgrading from 2.x, you'll get column1 and column2 as clustering keys (which they kind of are). When created via thrift in 3.x, column2 is entirely virtual (which it kind of is).
There is a typo in a comment, where in "... can't be change to support inclusive slice ...", I believe you want "exclusive".
in the 2nd branch (else if ...), we should use result.add(cell) directly. Not only does that already handle counters, but more importantly, if we don't do that, things like ttl(v) will not work on the compact column value.
Good point. Fixed.
the last else is confusing: by definition we shouldn't be exposing any other column than the "compact value" column, so that branch should probably not be there (or be an assertion that we shouldn't get there).
You're right. This is an artefact of the previous incorrect dense table handling.
In the updates loop in prepareUpdateOperations, not sure why the 2 first cases seem to assume that cfm.isSuperColumnValueColumn(def), but the 3rd case tests it explicitely (but after having assigned superColumnValue). In general, not sure why the 3rd case looks different from the other 2. Also, I find it a bit hard to follow the mix of if using continue and sometime using else, sometimes not. I'm personally prefer using else when needed and remove all the continue. Lastly, seems the method could use a check for superColumnKey != null.
I've refactored this method even further. There were many things that were technically correct, but were added on earlier stages, so got there historically. Also, did a bit of further improvements with collectMarkerSpecifications
In prepareInsertForJSONOperations, I believe the 2nd call to getRawTermForColumn is unecessary, you can use the raw for before the if. Further, the overall code of this method feels very similar to the one in prepareInsertOperations: can we extract the bulk of what's common? (the loop seems to be operating on a ColumnDefinition and the associated value, so either pass an Iterable<Pair<...>>, or build 2 lists; I don't think we care about the performance difference here).
Refactored this part as well.
In prepareDeleteOperations, need to fix the "Maybe not single only" comment.
I think what was meant was "not only SingleColumnRelation", but this is not true: multi column operations are not allowed. After writing more tests, I did discover that multi-column restrictions were not working properly, so I have added support for them. Also, for IN queries.
n rebuildLWTConditions, I'd invert the initial if check (making it, imo, easier to read) with a direct return and "de-indent" the bulk of the method. Also, I'd remove the columnConditions.clear(): doesn't feel useful, and feels weird to do it here (what if we make that collection immutable in future?).
Good point, fixed
In isSuperColumnMapColumn, can replace the first condition by ColumnDefinition#isRegular (not something due to this patch, but a good occasion to simplify).
In SelectStatement, I believe that if you move the SuperColumnCompatibility#processPartition call at the beginning of SelectStatement#processPartition, you'd be able to remove the specific call in ModificationStatement#buildCasFailureResultSet (and generally reduce the changes of misuses).
for 'if-then-else', we never use brakets for one of the branch and not another (meaning that we only skip brackets if all of the branches are one-liner).
I hope I've fixed all the instances.
Slightly bummed by the addition of the SC special fields in StatementRestrictions. Can't we move most of this in SuperColumnCompatibility#getColumnFilter, by extracting things there (tbc, I'm genuninely not sure how easy change that is, and if it's too convoluted, I'm fine leaving things as they are; I just prefer having a much special casing in SuperColumnCompatibility as possible. I also assume we can't entirely remove all special casing from StatementRestrictions since we will at least make sure it doesn't complain about requiring ALLOW FILTERING if we leave the "superColumnKeyColumn" restriciton in)?
I agree this did not look pretty. At the time of writing that felt like it made sense. But after the current refactoring, I had to switch most of the things to restrictions anyways, and there are now more different special cases, so I have opted out for a class that would hold all the restriction special-cases. I've added a lot of documentation inline, hope this should suffice.
In ModificationStatement why not directly use cfm.superCfValueColumn instead of looping over cfm.allColumnsInSeelectOrder() to find it?
Good point, simplified that one, too.
A word on the latest changes in the patch: because we have to support all ALTER statements that are supported by 2.x, we got a bit more logic in getSuper(CfKey|Value)Column now. Also, rebuild was changed in order to avoid accidentally dropping the fake columns, since we have to make sure they do get persisted. In order to properly interface 2.x nodes, there are several special-cases in CFMetadata to make sure we compose Selection correctly. Another big change is the way WHERE clauses are handled, pretty much everywhere. Biggest reason for it is that previous way (having public variables in StatementRestrictions was too ad-hoc and many use-cases were missing. Now we can correctly remap multi-column restrictions on SC key (both EQ and slice), slice (and EQ) restrictions on the SC key and IN restriction. All of them are handled somewhat differently and some require filtering on the very last stage. Where possible I've tried to use collection bounds.