Details

    • Type: Sub-task Sub-task
    • Status: Resolved
    • Priority: Trivial Trivial
    • Resolution: Fixed
    • Fix Version/s: 1.1.1
    • Component/s: API
    • Labels:
      None

      Description

      It would be nice to add a native syntax for the use of ReversedType. I'm sure there is anything in SQL that we inspired ourselves from, so I would propose something like:

      CREATE TABLE timeseries (
        key text,
        time uuid,
        value text,
        PRIMARY KEY (key, time DESC)
      )
      

      Alternatively, the DESC could also be put after the column name definition but one argument for putting it in the PK instead is that this only apply to keys.

      1. 4004.txt
        6 kB
        Sylvain Lebresne
      2. 4004_alternative-v2.txt
        15 kB
        Sylvain Lebresne
      3. 4004_alternative.txt
        14 kB
        Sylvain Lebresne

        Issue Links

          Activity

          Hide
          Sylvain Lebresne added a comment -

          Attached simple patch implementing the syntax above (a simple test has been push in the dtests).

          Show
          Sylvain Lebresne added a comment - Attached simple patch implementing the syntax above (a simple test has been push in the dtests).
          Hide
          Jonathan Ellis added a comment -

          I think I'd prefer to move this into our "implementation extensions" section as WITH CLUSTERING ORDERED BY (time DESC) or something similar.

          I also don't see any code to account for, if we've already ordered it as DESC in the clustering declaration, that's going to affect what we need to do for ORDER BY DESC (or ASC).

          Show
          Jonathan Ellis added a comment - I think I'd prefer to move this into our "implementation extensions" section as WITH CLUSTERING ORDERED BY (time DESC) or something similar. I also don't see any code to account for, if we've already ordered it as DESC in the clustering declaration, that's going to affect what we need to do for ORDER BY DESC (or ASC).
          Hide
          Sylvain Lebresne added a comment -

          The idea of my patch is that adding DESC to a field in a table declaration changes the order of records logically (it also does it physically, but that is an implementation detail). The typical example (which I think is very common) is for time series, where you're often interested by having record "by default" in reverse time order (from the more recent to the older). In that case a SELECT without any ORDER BY would return records directly in reverse time order. So at least given that definition, no, I don't think there is anything to do for the code of ORDER BY in SELECT.

          I reckon there is another way to add ReversedType, which I'm assuming is what you are referring to, which consists in not changing the logical order, but to change the physical order. In which case, yes, we would have to take that into account for ORDER BY during SELECT.

          I'll admit I'm less of a fan of that second option though. It seems that changing the logical order is 1) more natural (to me at least obviously) and 2) avoids having to use ORDER BY DESC in all your query.

          I think I'd prefer to move this into our "implementation extensions" section as WITH CLUSTERING ORDERED BY (time DESC) or something similar

          Why not, but I do prefer "my" notation in that it's more concise and, at least if it changes the logical order, I guess it doesn't feel like being just an 'implementation extension'.

          Show
          Sylvain Lebresne added a comment - The idea of my patch is that adding DESC to a field in a table declaration changes the order of records logically (it also does it physically, but that is an implementation detail). The typical example (which I think is very common) is for time series, where you're often interested by having record "by default" in reverse time order (from the more recent to the older). In that case a SELECT without any ORDER BY would return records directly in reverse time order. So at least given that definition, no, I don't think there is anything to do for the code of ORDER BY in SELECT. I reckon there is another way to add ReversedType, which I'm assuming is what you are referring to, which consists in not changing the logical order, but to change the physical order. In which case, yes, we would have to take that into account for ORDER BY during SELECT. I'll admit I'm less of a fan of that second option though. It seems that changing the logical order is 1) more natural (to me at least obviously) and 2) avoids having to use ORDER BY DESC in all your query. I think I'd prefer to move this into our "implementation extensions" section as WITH CLUSTERING ORDERED BY (time DESC) or something similar Why not, but I do prefer "my" notation in that it's more concise and, at least if it changes the logical order, I guess it doesn't feel like being just an 'implementation extension'.
          Hide
          Jonathan Ellis added a comment -

          The idea of my patch is that adding DESC to a field in a table declaration changes the order of records logically

          That makes sense for the old world of ReversedType and reversed slice flag, but I don't think it makes sense for CQL. When I ask for "ORDER BY X DESC" I expect the largest X first, period. So a table declaration like this makes sense as an optimization if DESC is your most frequent query type, but it shouldn't change the semantics of the query itself.

          Show
          Jonathan Ellis added a comment - The idea of my patch is that adding DESC to a field in a table declaration changes the order of records logically That makes sense for the old world of ReversedType and reversed slice flag, but I don't think it makes sense for CQL. When I ask for "ORDER BY X DESC" I expect the largest X first, period. So a table declaration like this makes sense as an optimization if DESC is your most frequent query type, but it shouldn't change the semantics of the query itself.
          Hide
          Sylvain Lebresne added a comment -

          but I don't think it makes sense for CQL.

          Why wouldn't it? The notion of what is largest or smallest only make sense once you've defined what ordering you're talking about (what I'm calling the logical ordering). We do still allow in CQL custom orderings (which is useful), so why giving a simple syntax to define the reverse ordering or an existing one wouldn't make sense? With my first patch, "ORDER BY X DESC" does always return the largest X first, given the ordering.

          but it shouldn't change the semantics of the query itself

          To be precise, it doesn't change the semantic of the query, it changes the logical ordering (which happens to be the same than the physical one but that last part is an implementation detail) of records in the table.

          Now, looking more closely at the alternative of keeping the logical ordering unchanged but changing the physical ordering (in order to get faster reversed queries), I think this just doesn't work. And by "doesn't work", I mean that as soon as we have composites, it would be costly to implement (making it useless). Typically, suppose you follow that idea and declare:

          CREATE TABLE timeseries (
            key text,
            kind int,
            time timestamp,
            value text,
            PRIMARY KEY (key, kind, time)
          ) WITH CLUSTERING ORDER BY (kind ASC, time DESC)
          

          Now, if the query is:

          SELECT kind, time FROM timeseries WHERE key = <somevalue> LIMIT 200;
          

          then, if the DESC above is "just an optimisation for reversed queries, then the expected result is (say):

          kind | time
          -----------
             0 |    0
             0 |    1
             ...
             0 |   99
             0 |  100
             1 |    0
             1 |    1
             1 |    2
             ...
          

          but the physical layout is now in fact:

          kind | time
          -----------
             0 |  100
             0 |   99
             ...
             0 |    1
             0 |    0
             1 |  100
             1 |   99
             1 |   98
             ...
          

          I don't see how to implement that query efficiently (without potentially making many queries).

          Lastly, while I think that changing the logical ordering is the correct way to deal with this, the question of the syntax is another matter. I do happen to like the syntax in the description of this ticket, but I don't care too much either.

          Show
          Sylvain Lebresne added a comment - but I don't think it makes sense for CQL. Why wouldn't it? The notion of what is largest or smallest only make sense once you've defined what ordering you're talking about (what I'm calling the logical ordering). We do still allow in CQL custom orderings (which is useful), so why giving a simple syntax to define the reverse ordering or an existing one wouldn't make sense? With my first patch, "ORDER BY X DESC" does always return the largest X first, given the ordering. but it shouldn't change the semantics of the query itself To be precise, it doesn't change the semantic of the query, it changes the logical ordering (which happens to be the same than the physical one but that last part is an implementation detail) of records in the table. Now, looking more closely at the alternative of keeping the logical ordering unchanged but changing the physical ordering (in order to get faster reversed queries), I think this just doesn't work. And by "doesn't work", I mean that as soon as we have composites, it would be costly to implement (making it useless). Typically, suppose you follow that idea and declare: CREATE TABLE timeseries ( key text, kind int, time timestamp, value text, PRIMARY KEY (key, kind, time) ) WITH CLUSTERING ORDER BY (kind ASC, time DESC) Now, if the query is: SELECT kind, time FROM timeseries WHERE key = <somevalue> LIMIT 200; then, if the DESC above is "just an optimisation for reversed queries, then the expected result is (say): kind | time ----------- 0 | 0 0 | 1 ... 0 | 99 0 | 100 1 | 0 1 | 1 1 | 2 ... but the physical layout is now in fact: kind | time ----------- 0 | 100 0 | 99 ... 0 | 1 0 | 0 1 | 100 1 | 99 1 | 98 ... I don't see how to implement that query efficiently (without potentially making many queries). Lastly, while I think that changing the logical ordering is the correct way to deal with this, the question of the syntax is another matter. I do happen to like the syntax in the description of this ticket, but I don't care too much either.
          Hide
          Jonathan Ellis added a comment -

          it doesn't change the semantic of the query, it changes the logical ordering (which happens to be the same than the physical one but that last part is an implementation detail) of records in the table

          From the standpoint of a new user, this is sophistry. If I ask for X DESC but I get X1 before X10, then you've most certainly changed the semantics of my query out from under me.

          With Thrift we've been doing the equivalent of writing assembly code. CQL lets us step up a level of abstraction. This is a Good Thing, but it will require a little getting used to for us veterans.

          if the DESC above is "just an optimisation for reversed queries"

          Remember that SELECT X WHERE Y is NOT the reverse order from SELECT X WHERE Y ORDER BY X DESC, or put another way, ORDER BY is never implicit. If ORDER BY is unspecified, any order is valid, so we are free to use the most efficient one.

          (The other relevant context here is that we decided not to support arbitrary orderings. The two valid orderings a user may request correspond to physical/clustered order, and the reverse of that. Both of which are reasonably efficient, but the former is more so.)

          So, I think we should continue to enforce that restriction, and allow clustering order only to optimize for forwards/backwards (and by implication, to affect the default result ordering). In your example, the permissible orderings would be KIND ASC, time DESC (most efficient, and what we should use when no explicit ORDER BY is given) and KIND DESC, time ASC.

          the question of the syntax is another matter

          My syntax preference falls out of the way I view the problem above. I.e., that it's there if you want to optimize for DESC but it doesn't change things in a really fundamental way. So I'd prefer to have it in the WITH extensions section along with similar options like COMPACT STORAGE.

          Show
          Jonathan Ellis added a comment - it doesn't change the semantic of the query, it changes the logical ordering (which happens to be the same than the physical one but that last part is an implementation detail) of records in the table From the standpoint of a new user, this is sophistry. If I ask for X DESC but I get X1 before X10, then you've most certainly changed the semantics of my query out from under me. With Thrift we've been doing the equivalent of writing assembly code. CQL lets us step up a level of abstraction. This is a Good Thing, but it will require a little getting used to for us veterans. if the DESC above is "just an optimisation for reversed queries" Remember that SELECT X WHERE Y is NOT the reverse order from SELECT X WHERE Y ORDER BY X DESC , or put another way, ORDER BY is never implicit. If ORDER BY is unspecified, any order is valid, so we are free to use the most efficient one. (The other relevant context here is that we decided not to support arbitrary orderings. The two valid orderings a user may request correspond to physical/clustered order, and the reverse of that. Both of which are reasonably efficient, but the former is more so.) So, I think we should continue to enforce that restriction, and allow clustering order only to optimize for forwards/backwards (and by implication, to affect the default result ordering). In your example, the permissible orderings would be KIND ASC, time DESC (most efficient, and what we should use when no explicit ORDER BY is given) and KIND DESC, time ASC. the question of the syntax is another matter My syntax preference falls out of the way I view the problem above. I.e., that it's there if you want to optimize for DESC but it doesn't change things in a really fundamental way. So I'd prefer to have it in the WITH extensions section along with similar options like COMPACT STORAGE.
          Hide
          Sylvain Lebresne added a comment -

          From the standpoint of a new user, this is sophistry.

          How is that sophistry, seriously? I think it's very important that the clustered part of th PK induces an ordering of records (which SQL don't have, but we're not talking about SQL here, right). It's important because you don't model things the same way in Cassandra than traditionally you do in SQL: you denormalize, you model time series etc... for which the notion that there is an ordering or records is not an implementation detail, nor something dealt with at query time (contrarily to SQL), but is an important part of the model. It would be confusing for brand new user to not say that ordering is part of the data model (and again yes, that's a difference with SQL). I also don't see how saying that is in any way related to being a veteran and whatnot. I 200% agree that CQL3 is an abstraction and that it is A Good Thing. I'm saying the ordering induced by the PK should be part of that abstraction.

          But then it's natural that SELECT without ORDER BY should return records in that clustering order, which will indeed not be the same than the order of with ORDER BY ASC unless Y is the first clustered key. But if is the first clustered key, then yes, SELECT and SELECT ORDER BY ASC should be the same (and they are). But then it's not rocket science to say that if the ordering is 'reversed alphabetical order', then 'z' > 'a' and thus a SELECT ORDER BY ASC returns 'z' before 'a'.

          So I absolutely and strongly refute that this proposal is somehow sophistry and even more so that it's a negation of the abstractive nature of CQL3 or influenced by the thrift API any more that the solution you're pushing for.

          The other relevant context here is that we decided not to support arbitrary orderings

          I'm either misunderstanding what you call 'arbitrary orderings' or I have not been part of that discussion. Because if you talk of custom types, then CQL3 does support them (you can declare CREATE TABLE foo (k "myCustomType" PRIMARY KEY)). And I'm -1 on removing that support, unless someone has compelling reason to do so because I certainly don't see any and that's useful. And yes, I do see this as a good reason to go with my proposal, since it's not very consistent if

          CREATE TABLE foo (
              k1 uuid,
              k2 "myEfficientComplexNumberType",
              c text,
              PRIMARY KEY (k1, k2)
          ) WITH CLUSTERING ORDER BY (k2 DESC)
          

          is not the same than

          CREATE TABLE foo (
              k1 uuid,
              k2 "myReversedEfficientComplexNumberType",
              c text,
              PRIMARY KEY (k1, k2)
          )
          
          Show
          Sylvain Lebresne added a comment - From the standpoint of a new user, this is sophistry. How is that sophistry, seriously? I think it's very important that the clustered part of th PK induces an ordering of records (which SQL don't have, but we're not talking about SQL here, right). It's important because you don't model things the same way in Cassandra than traditionally you do in SQL: you denormalize, you model time series etc... for which the notion that there is an ordering or records is not an implementation detail, nor something dealt with at query time (contrarily to SQL), but is an important part of the model. It would be confusing for brand new user to not say that ordering is part of the data model (and again yes, that's a difference with SQL). I also don't see how saying that is in any way related to being a veteran and whatnot. I 200% agree that CQL3 is an abstraction and that it is A Good Thing. I'm saying the ordering induced by the PK should be part of that abstraction. But then it's natural that SELECT without ORDER BY should return records in that clustering order, which will indeed not be the same than the order of with ORDER BY ASC unless Y is the first clustered key. But if is the first clustered key, then yes, SELECT and SELECT ORDER BY ASC should be the same (and they are). But then it's not rocket science to say that if the ordering is 'reversed alphabetical order', then 'z' > 'a' and thus a SELECT ORDER BY ASC returns 'z' before 'a'. So I absolutely and strongly refute that this proposal is somehow sophistry and even more so that it's a negation of the abstractive nature of CQL3 or influenced by the thrift API any more that the solution you're pushing for. The other relevant context here is that we decided not to support arbitrary orderings I'm either misunderstanding what you call 'arbitrary orderings' or I have not been part of that discussion. Because if you talk of custom types, then CQL3 does support them (you can declare CREATE TABLE foo (k "myCustomType" PRIMARY KEY)). And I'm -1 on removing that support, unless someone has compelling reason to do so because I certainly don't see any and that's useful. And yes, I do see this as a good reason to go with my proposal, since it's not very consistent if CREATE TABLE foo ( k1 uuid, k2 "myEfficientComplexNumberType", c text, PRIMARY KEY (k1, k2) ) WITH CLUSTERING ORDER BY (k2 DESC) is not the same than CREATE TABLE foo ( k1 uuid, k2 "myReversedEfficientComplexNumberType", c text, PRIMARY KEY (k1, k2) )
          Hide
          Jonathan Ellis added a comment -

          I'm either misunderstanding what you call 'arbitrary orderings' or I have not been part of that discussion

          I think you are misunderstanding. This is what I'm referring to:

          .           if (stmt.parameters.orderBy != null)
                      {
                          CFDefinition.Name name = cfDef.get(stmt.parameters.orderBy);
                          if (name == null)
                              throw new InvalidRequestException(String.format("Order by on unknown column %s", stmt.parameters.orderBy));
          
                          if (name.kind != CFDefinition.Name.Kind.COLUMN_ALIAS || name.position != 0)
                              throw new InvalidRequestException(String.format("Order by is currently only supported on the second column of the PRIMARY KEY (if any), got %s", stmt.parameters.orderBy));
                      }
          

          How is that sophistry, seriously?

          "ORDER BY X DESC" does not mean "give me them in the reverse order that Xes are in on disk", it means "give me larger values before smaller ones." This isn't open for debate, it's a very clear requirement.

          Remember that clustering is not new ground for databases; SQL has been there, done that. As I mentioned when we were designing the CQL3 schema syntax, RDBMSes have had a concept of clustered indexes for a long, long time. But clustering on an index ASC or DESC does not affect the results other than as an optimization; when you ORDER BY X, that's what you get.

          SQL and CQL are declarative languages: "Here is what I want; you figure out how to give me the results." This has proved a good design. Modifying the semantics of a query based on index or clustering or other declarations elsewhere has ZERO precedent and is bad design to boot; you don't want users to have to consult their DDL when debugging, to know what results a query will give.

          Thus, the only design that makes sense in the larger context of a declarative language is to treat the clustering as an optimization as I've described (or "as an index", if you prefer), and continue to reject ORDER BY requests that are neither forward- nor reverse-clustered.

          Show
          Jonathan Ellis added a comment - I'm either misunderstanding what you call 'arbitrary orderings' or I have not been part of that discussion I think you are misunderstanding. This is what I'm referring to: . if (stmt.parameters.orderBy != null ) { CFDefinition.Name name = cfDef.get(stmt.parameters.orderBy); if (name == null ) throw new InvalidRequestException( String .format( "Order by on unknown column %s" , stmt.parameters.orderBy)); if (name.kind != CFDefinition.Name.Kind.COLUMN_ALIAS || name.position != 0) throw new InvalidRequestException( String .format( "Order by is currently only supported on the second column of the PRIMARY KEY ( if any), got %s" , stmt.parameters.orderBy)); } How is that sophistry, seriously? "ORDER BY X DESC" does not mean "give me them in the reverse order that Xes are in on disk", it means "give me larger values before smaller ones." This isn't open for debate, it's a very clear requirement. Remember that clustering is not new ground for databases; SQL has been there, done that. As I mentioned when we were designing the CQL3 schema syntax, RDBMSes have had a concept of clustered indexes for a long, long time. But clustering on an index ASC or DESC does not affect the results other than as an optimization; when you ORDER BY X, that's what you get. SQL and CQL are declarative languages: "Here is what I want; you figure out how to give me the results." This has proved a good design. Modifying the semantics of a query based on index or clustering or other declarations elsewhere has ZERO precedent and is bad design to boot; you don't want users to have to consult their DDL when debugging, to know what results a query will give. Thus, the only design that makes sense in the larger context of a declarative language is to treat the clustering as an optimization as I've described (or "as an index", if you prefer), and continue to reject ORDER BY requests that are neither forward- nor reverse-clustered.
          Hide
          Sylvain Lebresne added a comment -

          This is what I'm referring to:

          Wait, what happened to "Third (and this is the big one) I strongly suspect that we're going to start supporting at least limited run-time ordering in the near future" from CASSANDRA-3925. How can I reconcile that with "The other relevant context here is that we decided not to support arbitrary orderings"?

          "ORDER BY X DESC" does not mean "give me them in the reverse order that Xes are in on disk"

          I never suggested that, not even a little. Not more than you did.

          it means "give me larger values before smaller ones." This isn't open for debate, it's a very clear requirement

          Sure. But the definition of larger versus smaller depends on what ordering you are talking about. This isn't open for debate either. Math have closed that debate for ages. And SQL is not excluded from that rule, but it just happens that SQL has default orderings (based on the column type) and you can't define new ones. But we can do that in CQL. We can independently of this ticket because of custom types.

          Again, once you consider custom types (which we have), you can't hide behind that the fact that value X is larger than Y depends on the ordering induces by your custom types. That's the ASC order, and DESC is the reverse of that. If someone define it's own custom types being "reverseIntegerType", how can you avoid SELECT ORDER BY DESC to not return 1 before 3? You can't, and returning 1 before 3 absolutely make sense because 1 is larger than 3 if the order is 'reverseInteger'.

          SQL and CQL are declarative languages: "Here is what I want; you figure out how to give me the results." This has proved a good design.

          Sure, nothing in what I'm suggesting is at odd with that.

          Modifying the semantics of a query based on index or clustering

          Again, I'm not suggesting any such thing at all. The semantic of a SELECT X ORDER BY Y depends on what ordering relation is defined for Y because the ordering relation is what defines the order. SQL has a limited and non customizable set of types and (implicitly) define an ordering relation for each one. If one type was 'thing' it would have to define the ordering of 'thing' otherwise ORDER BY queries wouldn't be properly defined. CQL also has a default set of types which have associated ordering relation. I'm only suggesting we add a simple syntax so that given a type/relation (a default one or a custom one btw), we can define the type/ordering relation that validate the same value but have the reversed ordering.

          Show
          Sylvain Lebresne added a comment - This is what I'm referring to: Wait, what happened to "Third (and this is the big one) I strongly suspect that we're going to start supporting at least limited run-time ordering in the near future" from CASSANDRA-3925 . How can I reconcile that with "The other relevant context here is that we decided not to support arbitrary orderings"? "ORDER BY X DESC" does not mean "give me them in the reverse order that Xes are in on disk" I never suggested that, not even a little. Not more than you did. it means "give me larger values before smaller ones." This isn't open for debate, it's a very clear requirement Sure. But the definition of larger versus smaller depends on what ordering you are talking about. This isn't open for debate either. Math have closed that debate for ages. And SQL is not excluded from that rule, but it just happens that SQL has default orderings (based on the column type) and you can't define new ones. But we can do that in CQL. We can independently of this ticket because of custom types. Again, once you consider custom types (which we have), you can't hide behind that the fact that value X is larger than Y depends on the ordering induces by your custom types. That's the ASC order, and DESC is the reverse of that. If someone define it's own custom types being "reverseIntegerType", how can you avoid SELECT ORDER BY DESC to not return 1 before 3? You can't, and returning 1 before 3 absolutely make sense because 1 is larger than 3 if the order is 'reverseInteger'. SQL and CQL are declarative languages: "Here is what I want; you figure out how to give me the results." This has proved a good design. Sure, nothing in what I'm suggesting is at odd with that. Modifying the semantics of a query based on index or clustering Again, I'm not suggesting any such thing at all. The semantic of a SELECT X ORDER BY Y depends on what ordering relation is defined for Y because the ordering relation is what defines the order . SQL has a limited and non customizable set of types and (implicitly) define an ordering relation for each one. If one type was 'thing' it would have to define the ordering of 'thing' otherwise ORDER BY queries wouldn't be properly defined. CQL also has a default set of types which have associated ordering relation. I'm only suggesting we add a simple syntax so that given a type/relation (a default one or a custom one btw), we can define the type/ordering relation that validate the same value but have the reversed ordering.
          Hide
          Jonathan Ellis added a comment - - edited

          what happened to "Third (and this is the big one) I strongly suspect that we're going to start supporting at least limited run-time ordering in the near future" from CASSANDRA-3925

          Nothing, except that it's a separate ticket's worth of work.

          I never suggested that [ORDER BY depends on disk order], not even a little. Not more than you did.

          I really don't see the distinction between saying "disk order" and "clustering order," as in "the clustered part of th PK induces an ordering of records ... SELECT without ORDER BY should return records in that clustering order ... SELECT ORDER BY ASC returns 'z' before 'a'."

          But disk order or clustering order, I don't care which you call it; I reject both as modifiers of the semantics of ASC and DESC. (But again, SELECT with no explicit ORDER BY is free to return in any order we like.)

          the fact that value X is larger than Y depends on the ordering induces by your custom types

          Agreed. But that's not the same as reverse-clustering on a type: "y int ... PRIMARY KEY (x, y DESC)" (to use your syntax) is NOT the same as "y ReversedInt ... PRIMARY KEY (x, y)." In the former, ORDER BY Y DESC should give larger Y before smaller (that is, 100 before 1); in the latter, the reverse.

          Show
          Jonathan Ellis added a comment - - edited what happened to "Third (and this is the big one) I strongly suspect that we're going to start supporting at least limited run-time ordering in the near future" from CASSANDRA-3925 Nothing, except that it's a separate ticket's worth of work. I never suggested that [ORDER BY depends on disk order] , not even a little. Not more than you did. I really don't see the distinction between saying "disk order" and "clustering order," as in "the clustered part of th PK induces an ordering of records ... SELECT without ORDER BY should return records in that clustering order ... SELECT ORDER BY ASC returns 'z' before 'a'." But disk order or clustering order, I don't care which you call it; I reject both as modifiers of the semantics of ASC and DESC. (But again, SELECT with no explicit ORDER BY is free to return in any order we like.) the fact that value X is larger than Y depends on the ordering induces by your custom types Agreed. But that's not the same as reverse-clustering on a type: "y int ... PRIMARY KEY (x, y DESC)" (to use your syntax) is NOT the same as "y ReversedInt ... PRIMARY KEY (x, y)." In the former, ORDER BY Y DESC should give larger Y before smaller (that is, 100 before 1); in the latter, the reverse.
          Hide
          Sylvain Lebresne added a comment -

          Nothing, except that it's a separate ticket's worth of work.

          Oh ok. For the records I didn't implied otherwise.

          But that's not the same as reverse-clustering on a type: "y int ... PRIMARY KEY (x, y DESC)" (to use your syntax) is NOT the same as "y ReversedInt ... PRIMARY KEY (x, y)." In the former, ORDER BY Y DESC

          What?! I said that I wasn't sure my syntax was good. But with all I've said I expected it was clear that what I want to do with this ticket from day one is to allow to define "y ReversedInt ... PRIMARY KEY" but without having to write a custom java class since we don't have to and that is exactly what my patch implements. I'm fine saying my syntax suck and allow to write is "y reversed(int) .. PK". But to be clear, I don't think that option is a bad fit at all for CQL3, and that's not the C* veteran talk.

          In the former, ORDER BY Y DESC should give larger Y before smaller (that is, 100 before 1); in the latter, the reverse

          To my defence, you're attributing your semantic to my made up syntax (which again, may be is counter-intuitive to you with your background but is really not to me, and I made it clear that it was a suggestion. I even said in the description that "Alternatively, the DESC could also be put after the column name definition").

          I really don't see the distinction between saying "disk order" and "clustering order," as in "the clustered part of th PK induces an ordering of records ...

          Maybe with the reversed(int) syntax it makes it more clear, but when I talk about ordering of records, I'm saying that we should say that in CQL the model defines an ordering of the rows (where rows is in the sense of SQL) in tables, order that is defined as the ordering implied by the types of the "clustered" keys (and to be clear, I don't care what clustering mean in SQL, I'm reusing the name because you're using it, but I only mean by that term the fields in the PK after the first one). That doesn't imply the disk order has to respect it (though it will but that's an implementation detail). In other words, and somewhat unrelated to this issue, I think there would be value to say that the order of SELECT without any ORDER BY is something defined by CQL (while SQL does not do that). I think there would be value because I think it helps understanding which model are a good fit for CQL.

          Now, and to sum up, I think that having the "y reversed(int)" syntax has the following advantages over just allowing to change the on-disk order:

          1. I do think that in most case it's more natural to define a reversed type rather than just adding an optim for reversed queries. Typically, it means that 'y reversed("myCustomType")' is the same than 'y "myReversedCustomType"' which has a nice consistency to it. In the alternative, and even though I'm not saying it's ill defined in any way, I do think that have a form of syntactic double negation that is not equivalent to removing both is kind of weird.
          2. Though that seems to be very clear to you, I do think that it's not necessarily clear per se (i.e to anyone that may not be familiar with SQL clustering for instance) that "WITH CLUSTERING ORDER (x DESC)" does not change the ordering (and by that I mean 'does not semantically mean "x reversed(type)"').
          3. With that solution, we can maintain (without doing anything) the fact that a select without ORDERING respect the ordering implied by the "clustering". I think it's convenient for C*. Again, lots of efficient model for C* uses that ordering, so it feels like a better idea to say 'oh, and contrarily to SQL the order of records in a table is defined (and thus the default ordering or SELECT) and lots of good modeling pattern for C* rely on this'.
          Show
          Sylvain Lebresne added a comment - Nothing, except that it's a separate ticket's worth of work. Oh ok. For the records I didn't implied otherwise. But that's not the same as reverse-clustering on a type: "y int ... PRIMARY KEY (x, y DESC)" (to use your syntax) is NOT the same as "y ReversedInt ... PRIMARY KEY (x, y)." In the former, ORDER BY Y DESC What?! I said that I wasn't sure my syntax was good. But with all I've said I expected it was clear that what I want to do with this ticket from day one is to allow to define "y ReversedInt ... PRIMARY KEY" but without having to write a custom java class since we don't have to and that is exactly what my patch implements. I'm fine saying my syntax suck and allow to write is "y reversed(int) .. PK". But to be clear, I don't think that option is a bad fit at all for CQL3, and that's not the C* veteran talk. In the former, ORDER BY Y DESC should give larger Y before smaller (that is, 100 before 1); in the latter, the reverse To my defence, you're attributing your semantic to my made up syntax (which again, may be is counter-intuitive to you with your background but is really not to me, and I made it clear that it was a suggestion. I even said in the description that "Alternatively, the DESC could also be put after the column name definition"). I really don't see the distinction between saying "disk order" and "clustering order," as in "the clustered part of th PK induces an ordering of records ... Maybe with the reversed(int) syntax it makes it more clear, but when I talk about ordering of records, I'm saying that we should say that in CQL the model defines an ordering of the rows (where rows is in the sense of SQL) in tables, order that is defined as the ordering implied by the types of the "clustered" keys (and to be clear, I don't care what clustering mean in SQL, I'm reusing the name because you're using it, but I only mean by that term the fields in the PK after the first one). That doesn't imply the disk order has to respect it (though it will but that's an implementation detail). In other words, and somewhat unrelated to this issue, I think there would be value to say that the order of SELECT without any ORDER BY is something defined by CQL (while SQL does not do that). I think there would be value because I think it helps understanding which model are a good fit for CQL. Now, and to sum up, I think that having the "y reversed(int)" syntax has the following advantages over just allowing to change the on-disk order: I do think that in most case it's more natural to define a reversed type rather than just adding an optim for reversed queries. Typically, it means that 'y reversed("myCustomType")' is the same than 'y "myReversedCustomType"' which has a nice consistency to it. In the alternative, and even though I'm not saying it's ill defined in any way, I do think that have a form of syntactic double negation that is not equivalent to removing both is kind of weird. Though that seems to be very clear to you, I do think that it's not necessarily clear per se (i.e to anyone that may not be familiar with SQL clustering for instance) that "WITH CLUSTERING ORDER (x DESC)" does not change the ordering (and by that I mean 'does not semantically mean "x reversed(type)"'). With that solution, we can maintain (without doing anything) the fact that a select without ORDERING respect the ordering implied by the "clustering". I think it's convenient for C*. Again, lots of efficient model for C* uses that ordering, so it feels like a better idea to say 'oh, and contrarily to SQL the order of records in a table is defined (and thus the default ordering or SELECT) and lots of good modeling pattern for C* rely on this'.
          Hide
          Jonathan Ellis added a comment - - edited

          the model defines an ordering of the rows (where rows is in the sense of SQL) in tables, order that is defined as the ordering implied by the types of the "clustered" keys (and to be clear, I don't care what clustering mean in SQL, I'm reusing the name because you're using it, but I only mean by that term the fields in the PK after the first one). That doesn't imply the disk order has to respect it

          I think the mental model of rows as predicates, queries returning sets of rows with no inherent order, and ORDER BY as specifying the desired order, is much simpler and easier to reason about (see prior point about having to consult DDL + QUERY to figure out what order results are supposed to appear in).

          To my defence, you're attributing your semantic to my made up syntax

          I was trying to say that I view ReversedType(Int32Type) as modification of Int32Type (which should not affect int ordering) and not a completely new type, the way the (hypothetical) ReversedInt (or BackwardsInt, or AlmostNotQuiteInt) type would be. Since the latter isn't really related to an int at all, even though they look a lot like ints in many respects.

          I do think that in most case it's more natural to define a reversed type rather than just adding an optim for reversed queries.

          I don't follow.

          I do think that have a form of syntactic double negation that is not equivalent to removing both is kind of weird... I do think that it's not necessarily clear per se (i.e to anyone that may not be familiar with SQL clustering for instance) that "WITH CLUSTERING ORDER (x DESC)" does not change the ordering

          But saying "ORDER BY X DESC always gives you higher X first" (and ASC always gives you lower first) is the only way to avoid the double negation! Otherwise in your original syntax of PK (X, Y DESC), the only way to get 1 to sort before 100 is to ask for ORDER BY Y DESC so the DESC cancel out!

          I just can't agree that "ORDER BY Y DESC" giving

          {1, 100}

          is going to be less confusing than

          {100, 1}

          , no matter how much we tell users, "No, you see, it's really just reversing the clustering order, which you already reversed..."

          Users may not be familiar with clustering, but they're very familiar with ORDER BY, which as I said above, is very clear on what it does. Clustering is the closest example of how performance hints should not change the semantics of the query, but indexes fall into the same category.

          It may also be worth pointing out that it's worth preserving CQL compatibility with Hive; queries that execute on both (and to the best of my knowledge CQL3 is a strict subset of Hive SQL) should not give different results.

          Show
          Jonathan Ellis added a comment - - edited the model defines an ordering of the rows (where rows is in the sense of SQL) in tables, order that is defined as the ordering implied by the types of the "clustered" keys (and to be clear, I don't care what clustering mean in SQL, I'm reusing the name because you're using it, but I only mean by that term the fields in the PK after the first one). That doesn't imply the disk order has to respect it I think the mental model of rows as predicates, queries returning sets of rows with no inherent order, and ORDER BY as specifying the desired order, is much simpler and easier to reason about (see prior point about having to consult DDL + QUERY to figure out what order results are supposed to appear in). To my defence, you're attributing your semantic to my made up syntax I was trying to say that I view ReversedType(Int32Type) as modification of Int32Type (which should not affect int ordering) and not a completely new type, the way the (hypothetical) ReversedInt (or BackwardsInt, or AlmostNotQuiteInt) type would be. Since the latter isn't really related to an int at all, even though they look a lot like ints in many respects. I do think that in most case it's more natural to define a reversed type rather than just adding an optim for reversed queries. I don't follow. I do think that have a form of syntactic double negation that is not equivalent to removing both is kind of weird... I do think that it's not necessarily clear per se (i.e to anyone that may not be familiar with SQL clustering for instance) that "WITH CLUSTERING ORDER (x DESC)" does not change the ordering But saying " ORDER BY X DESC always gives you higher X first" (and ASC always gives you lower first) is the only way to avoid the double negation! Otherwise in your original syntax of PK (X, Y DESC), the only way to get 1 to sort before 100 is to ask for ORDER BY Y DESC so the DESC cancel out! I just can't agree that "ORDER BY Y DESC" giving {1, 100} is going to be less confusing than {100, 1} , no matter how much we tell users, "No, you see, it's really just reversing the clustering order, which you already reversed..." Users may not be familiar with clustering, but they're very familiar with ORDER BY, which as I said above, is very clear on what it does. Clustering is the closest example of how performance hints should not change the semantics of the query, but indexes fall into the same category. It may also be worth pointing out that it's worth preserving CQL compatibility with Hive; queries that execute on both (and to the best of my knowledge CQL3 is a strict subset of Hive SQL) should not give different results.
          Hide
          Sylvain Lebresne added a comment -

          I think the mental model of rows as predicates, queries returning sets of rows with no inherent order, and ORDER BY as specifying the desired order, is much simpler and easier to reason about

          Much simpler and easier than what? You're pretending that what I'm saying is somehow a complete revolution where it's not. I'm only suggesting that it would be a good idea for CQL to say that by default rows are returned in a defined order. Saying the order is defined (by opposition to not be) hardly breaks the mental mode. And that certainly does not change in any way that ORDER BY would still specify the desired order.

          having to consult DDL + QUERY to figure out what order results are supposed to appear in

          Again, imho that does not apply to my suggestion at all. People already have to know their DDL for QUERY. They need to know which column names are defined and which type they have (since without knowing the type you cannot know the ordering). Since all I'm suggesting is a convenient syntax to define "x ReversedInt", and since you can already define "x ReversedInt" (provided you write such AbstractType), my suggestion doesn't change one bit how much people will have to consult the DDL.

          Users may not be familiar with clustering, but they're very familiar with ORDER BY

          I've never suggested to change the semantic of ORDER BY at all. ORDER BY x ASC is defined as 'return rows in the order induces by the type of x'. I'm suggesting a syntax to define new types from existing ones. How is that even close to be related to changing what ORDER BY does?

          Anyway, I'm still convinced that adding a simple syntax to allow to define a type with reversed order given an existing type (custom or predefined) is in fact cleaner from the point of semantic than the alternative you are suggesting (and it is certainly not the semantic disrupting monster you're seem to be suggesting it is for a reason that is beyond me), and at a personal level I find it more natural (as in: the syntax naturally imply the semantic).

          But anyway, since we're going nowhere, I'm giving up.

          I'm attaching the patch with the alternative you've suggested. I do not think that's the best solution for CQL but from a pure technical point of point it does is a solution so so be it.

          Show
          Sylvain Lebresne added a comment - I think the mental model of rows as predicates, queries returning sets of rows with no inherent order, and ORDER BY as specifying the desired order, is much simpler and easier to reason about Much simpler and easier than what? You're pretending that what I'm saying is somehow a complete revolution where it's not. I'm only suggesting that it would be a good idea for CQL to say that by default rows are returned in a defined order. Saying the order is defined (by opposition to not be) hardly breaks the mental mode. And that certainly does not change in any way that ORDER BY would still specify the desired order. having to consult DDL + QUERY to figure out what order results are supposed to appear in Again, imho that does not apply to my suggestion at all. People already have to know their DDL for QUERY. They need to know which column names are defined and which type they have (since without knowing the type you cannot know the ordering ). Since all I'm suggesting is a convenient syntax to define "x ReversedInt", and since you can already define "x ReversedInt" (provided you write such AbstractType), my suggestion doesn't change one bit how much people will have to consult the DDL. Users may not be familiar with clustering, but they're very familiar with ORDER BY I've never suggested to change the semantic of ORDER BY at all . ORDER BY x ASC is defined as 'return rows in the order induces by the type of x'. I'm suggesting a syntax to define new types from existing ones. How is that even close to be related to changing what ORDER BY does? Anyway, I'm still convinced that adding a simple syntax to allow to define a type with reversed order given an existing type (custom or predefined) is in fact cleaner from the point of semantic than the alternative you are suggesting (and it is certainly not the semantic disrupting monster you're seem to be suggesting it is for a reason that is beyond me), and at a personal level I find it more natural (as in: the syntax naturally imply the semantic). But anyway, since we're going nowhere, I'm giving up. I'm attaching the patch with the alternative you've suggested. I do not think that's the best solution for CQL but from a pure technical point of point it does is a solution so so be it.
          Hide
          Jonathan Ellis added a comment -

          I may be misreading this, but it looks like it treats ORDER BY X DESC, Y ASC the same as ORDER BY Y ASC, X DESC since it iterates in HashMap's entrySet order. A simple fix might be to use a LinkedHashMap.

          Show
          Jonathan Ellis added a comment - I may be misreading this, but it looks like it treats ORDER BY X DESC, Y ASC the same as ORDER BY Y ASC, X DESC since it iterates in HashMap's entrySet order. A simple fix might be to use a LinkedHashMap.
          Hide
          Sylvain Lebresne added a comment -

          Which hashmap are you talking about? If it's stmt.parameters.orderings, the actual order of elements in the map is not used by the code so that shouldn't matter. I'll note there is a test for this patch at https://github.com/riptano/cassandra-dtest/blob/master/cql_tests.py (the line require('#4004') needs to be commented to actually run the test) showing that ORDER BY X DESC, Y ASC is the reverse of ORDER BY Y ASC, X DESC.

          As a side note, I've convinced myself that 4004_alternative.txt is maybe a bit more natural than creating a new, reversed type so I'm good with that.

          Show
          Sylvain Lebresne added a comment - Which hashmap are you talking about? If it's stmt.parameters.orderings , the actual order of elements in the map is not used by the code so that shouldn't matter. I'll note there is a test for this patch at https://github.com/riptano/cassandra-dtest/blob/master/cql_tests.py (the line require('#4004') needs to be commented to actually run the test) showing that ORDER BY X DESC, Y ASC is the reverse of ORDER BY Y ASC, X DESC. As a side note, I've convinced myself that 4004_alternative.txt is maybe a bit more natural than creating a new, reversed type so I'm good with that.
          Hide
          Jonathan Ellis added a comment -

          My concern is that that the order of the terms in the ORDER BY clause is relevant – in the dtest, c1 is always given first (which is valid). But I don't think the code would reject c2 DESC, c1 ASC, although it should.

          So what I'm looking for is either a reference to the ordering from columnAliases, or an ordered map for definedOrderings, which I'm not seeing.

          Show
          Jonathan Ellis added a comment - My concern is that that the order of the terms in the ORDER BY clause is relevant – in the dtest, c1 is always given first (which is valid). But I don't think the code would reject c2 DESC, c1 ASC , although it should. So what I'm looking for is either a reference to the ordering from columnAliases , or an ordered map for definedOrderings , which I'm not seeing.
          Hide
          Sylvain Lebresne added a comment -

          You're right. Attached v2 fixes that (and I've added a check we do catch that in the dtest).

          Show
          Sylvain Lebresne added a comment - You're right. Attached v2 fixes that (and I've added a check we do catch that in the dtest).
          Hide
          Jonathan Ellis added a comment -

          +1

          Show
          Jonathan Ellis added a comment - +1
          Hide
          Sylvain Lebresne added a comment -

          Committed, thanks

          Show
          Sylvain Lebresne added a comment - Committed, thanks

            People

            • Assignee:
              Sylvain Lebresne
              Reporter:
              Sylvain Lebresne
              Reviewer:
              Jonathan Ellis
            • Votes:
              1 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development