Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-9231

Support Routing Key as part of Partition Key

    Details

    • Type: Wish
    • Status: Resolved
    • Priority: Major
    • Resolution: Won't Fix
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      Provide support for sub-dividing the partition key into a routing key and a non-routing key component. Currently, all columns that make up the partition key of the primary key are also routing keys, i.e. they determine which nodes store the data. This proposal would give the data modeler the ability to designate only a subset of the columns that comprise the partition key to be routing keys. The non-routing key columns of the partition key identify the partition but are not used to determine where to store the data.

      Consider the following example table definition:
      CREATE TABLE foo (
      a int,
      b int,
      c int,
      d int,
      PRIMARY KEY (([a], b), c ) );

      (a,b) is the partition key, c is the clustering key, and d is just a column. In addition, the square brackets identify the routing key as column a. This means that only the value of column a is used to determine the node for data placement (i.e. only the value of column a is murmur3 hashed to compute the token). In addition, column b is needed to identify the partition but does not influence the placement.

      This has the benefit that all rows with the same routing key (but potentially different non-routing key columns of the partition key) are stored on the same node and that knowledge of such co-locality can be exploited by applications build on top of Cassandra.
      Currently, the only way to achieve co-locality is within a partition. However, this approach has the limitations that: a) there are theoretical and (more importantly) practical limitations on the size of a partition and b) rows within a partition are ordered and an index is build to exploit such ordering. For large partitions that overhead is significant if ordering isn't needed.
      In other words, routing keys afford a simple means to achieve scalable node-level co-locality without ordering while clustering keys afford page-level co-locality with ordering. As such, they address different co-locality needs giving the data modeler the flexibility to choose what is needed for their application.

        Issue Links

          Activity

          Hide
          jbellis Jonathan Ellis added a comment -

          Repair: shouldn't be an issue now that we have incremental mode.

          Compaction: unclear how much extra write amplification will happen vs having them in separate partitions but same machine. (vnode-based compaction doesn't help with either one.) On balance I'd say we'd be well served by improving compaction in general.

          Show
          jbellis Jonathan Ellis added a comment - Repair: shouldn't be an issue now that we have incremental mode. Compaction: unclear how much extra write amplification will happen vs having them in separate partitions but same machine. (vnode-based compaction doesn't help with either one.) On balance I'd say we'd be well served by improving compaction in general.
          Hide
          jjordan Jeremiah Jordan added a comment -

          I think we probably have other issues to solve besides CASSANDRA-9754 for multi-GB partitions to be viable? Are you not going to still have operational issues around repairing them and compacting them still?

          Show
          jjordan Jeremiah Jordan added a comment - I think we probably have other issues to solve besides CASSANDRA-9754 for multi-GB partitions to be viable? Are you not going to still have operational issues around repairing them and compacting them still?
          Hide
          jbellis Jonathan Ellis added a comment -

          CASSANDRA-9754 is making good progress, which should obviate the major reason for this ticket by allowing mutli-GB partitions. Let's focus on that rather than officially baking a workaround for large partitions into CQL.

          Show
          jbellis Jonathan Ellis added a comment - CASSANDRA-9754 is making good progress, which should obviate the major reason for this ticket by allowing mutli-GB partitions. Let's focus on that rather than officially baking a workaround for large partitions into CQL.
          Hide
          bcoverston Benjamin Coverston added a comment -

          I'm also -1 on adding UDFs into the mix, just on the merits of losing the token aware routing from the client. A simple designation of a some of the partition keys as routing keys would serve the use cases I'm aware of.

          Show
          bcoverston Benjamin Coverston added a comment - I'm also -1 on adding UDFs into the mix, just on the merits of losing the token aware routing from the client. A simple designation of a some of the partition keys as routing keys would serve the use cases I'm aware of.
          Hide
          benedict Benedict added a comment - - edited

          They wouldn't be providing arbitrary tokens, they would be providing arbitrary input to the hash function (for Random, MP3).

          CREATE FUNCTION myOrderedTokenFct(a bigint) RETURNS bigint AS 'return a';
          CREATE TABLE t (
             a int PRIMARY KEY,
             b text,
             c text
          ) with tokenizer=myOrderedTokenFct;
          

          Basically, this gets you very close to a per-table partitioner. The actual partitioner would just define the "domain" of the tokens and how they sort, but the actual computation would be per-table. And this for very, very little change to the syntax and barely more complexity code-wise than the "routing key" idea.

          It looks to me like these two statements disagree, but I may be mistaken.

          Show
          benedict Benedict added a comment - - edited They wouldn't be providing arbitrary tokens, they would be providing arbitrary input to the hash function (for Random, MP3). CREATE FUNCTION myOrderedTokenFct(a bigint) RETURNS bigint AS ' return a'; CREATE TABLE t ( a int PRIMARY KEY, b text, c text ) with tokenizer=myOrderedTokenFct; Basically, this gets you very close to a per-table partitioner. The actual partitioner would just define the "domain" of the tokens and how they sort, but the actual computation would be per-table. And this for very, very little change to the syntax and barely more complexity code-wise than the "routing key" idea. It looks to me like these two statements disagree, but I may be mistaken.
          Hide
          thobbs Tyler Hobbs added a comment -

          However I would point out that letting the user provide an arbitrary token lets them, for instance, break the "order preserving" assumptions of BOP, or the "fair distribution" assumptions of the hash partitioner.

          They wouldn't be providing arbitrary tokens, they would be providing arbitrary input to the hash function (for Random, MP3). The distribution would be approximately as fair as it would be without the transform step.

          For BOP they would maintain the order of whatever the function returns, which makes sense and seems like exactly what the user would want.

          FWIW, I agree with Sylvain's preference for using functions rather than a routing key, for the same reasons he lists.

          Show
          thobbs Tyler Hobbs added a comment - However I would point out that letting the user provide an arbitrary token lets them, for instance, break the "order preserving" assumptions of BOP, or the "fair distribution" assumptions of the hash partitioner. They wouldn't be providing arbitrary tokens, they would be providing arbitrary input to the hash function (for Random, MP3). The distribution would be approximately as fair as it would be without the transform step. For BOP they would maintain the order of whatever the function returns, which makes sense and seems like exactly what the user would want. FWIW, I agree with Sylvain's preference for using functions rather than a routing key, for the same reasons he lists.
          Hide
          benedict Benedict added a comment -

          I think we're just making the same arguments back and forth, so I'll leave it here for now. However I would point out that letting the user provide an arbitrary token lets them, for instance, break the "order preserving" assumptions of BOP, or the "fair distribution" assumptions of the hash partitioner. This latter in particular could lead to many future optimizations (e.g. CASSANDRA-7282) instead degrading such a cluster.

          Show
          benedict Benedict added a comment - I think we're just making the same arguments back and forth, so I'll leave it here for now. However I would point out that letting the user provide an arbitrary token lets them, for instance, break the "order preserving" assumptions of BOP, or the "fair distribution" assumptions of the hash partitioner. This latter in particular could lead to many future optimizations (e.g. CASSANDRA-7282 ) instead degrading such a cluster.
          Hide
          iamaleksey Aleksey Yeschenko added a comment -

          As it stands now, I'm -1 on involving UDFs here. The use case I have in mind is the only real use case I've heard, from just 2 users. They'd be better served by the less complicated designation of some of the partition key columns for calculating the token and don't need this extra power.

          Don't have much to add, otherwise. The ticket is not - yet - urgent, there is at least a few months ahead before starting to work on it. I'm going to wait for some other use cases before I'm convinced that the full UDF approach makes any sense here, and put this issue on hold otherwise.

          Show
          iamaleksey Aleksey Yeschenko added a comment - As it stands now, I'm -1 on involving UDFs here. The use case I have in mind is the only real use case I've heard, from just 2 users. They'd be better served by the less complicated designation of some of the partition key columns for calculating the token and don't need this extra power. Don't have much to add, otherwise. The ticket is not - yet - urgent, there is at least a few months ahead before starting to work on it. I'm going to wait for some other use cases before I'm convinced that the full UDF approach makes any sense here, and put this issue on hold otherwise.
          Hide
          slebresne Sylvain Lebresne added a comment -

          the partition key distributes the data both within and without the node, whereas the routing key only without

          I honestly don't understand what that sentence means, especially in term of modeling (the concept of distribution within a node sounds an aweful lot like getting into implementation details). I know I'm not very smart, but let's say I'm still not sold about the whole simplicity of how to explain the concept.

          There are also two things that seem to be conflated in your proposal: per table partitioners, and arbitrary functions as partitioners.

          I'm not sure why you're trying to find complexity in what I'm suggesting. Technically, the routing key idea is just saying that for a specific table, instead of using the "default" partitioner hash function on the partition key to compute the token, we'll use a function that first project some part of said partition key and then apply the hash function. It is using a custom token function, just a super special one. I'm only suggesting we allow any function instead of just either the default or another very special function. There is nothing more to do with midpoint calculation, random token creation and whatnot than with the routing key idea.

          I'm an not in any way suggesting per-table partitioners. I don't want to do it ever because that's a lot of complexity that I'm really not convinced is worth it. What I am saying is that by allowing generic custom token function (instead of just a syntax for one specific custom function), we'll likely actually cover most of the use case for per-table partitioner (probably not all, but most). And this with virtually no added complexity compared to the routing key idea.

          However we can deliver a lot of the functionality you suggest with just arbitrary function application to the fields in the partition key when defining the routing key.

          That's almost exactly what I'm suggesting, except that by making it just one function on the whole partition key, it's actually more flexible and you don't have to introduce 2 concepts: the routing key and then functions on routing key elements.

          Show
          slebresne Sylvain Lebresne added a comment - the partition key distributes the data both within and without the node, whereas the routing key only without I honestly don't understand what that sentence means, especially in term of modeling (the concept of distribution within a node sounds an aweful lot like getting into implementation details). I know I'm not very smart, but let's say I'm still not sold about the whole simplicity of how to explain the concept. There are also two things that seem to be conflated in your proposal: per table partitioners, and arbitrary functions as partitioners. I'm not sure why you're trying to find complexity in what I'm suggesting. Technically, the routing key idea is just saying that for a specific table, instead of using the "default" partitioner hash function on the partition key to compute the token, we'll use a function that first project some part of said partition key and then apply the hash function. It is using a custom token function, just a super special one. I'm only suggesting we allow any function instead of just either the default or another very special function. There is nothing more to do with midpoint calculation, random token creation and whatnot than with the routing key idea. I'm an not in any way suggesting per-table partitioners. I don't want to do it ever because that's a lot of complexity that I'm really not convinced is worth it. What I am saying is that by allowing generic custom token function (instead of just a syntax for one specific custom function), we'll likely actually cover most of the use case for per-table partitioner (probably not all, but most). And this with virtually no added complexity compared to the routing key idea. However we can deliver a lot of the functionality you suggest with just arbitrary function application to the fields in the partition key when defining the routing key. That's almost exactly what I'm suggesting, except that by making it just one function on the whole partition key, it's actually more flexible and you don't have to introduce 2 concepts: the routing key and then functions on routing key elements.
          Hide
          benedict Benedict added a comment -

          invalidate less documentation/existing assumptions

          But we wont invalidate them: it will still be true of the partition key; the routing key would always be a subset of the partition key, so the statements still hold true. The difference is that the partition key distributes the data both within and without the node, whereas the routing key only without. So it's a refinement rather than a rewrite/invalidation.

          Besides, that's really only one of my point.

          There are also two things that seem to be conflated in your proposal: per table partitioners, and arbitrary functions as partitioners. The latter is more problematic than the former, since we need to know certain things about the token distribution, such as order preservation, midpoint calculation, random token creation; even ring description is apparently specialized (perhaps this can be abstracted, not sure).

          However we can deliver a lot of the functionality you suggest with just arbitrary function application to the fields in the partition key when defining the routing key. I don't think this should be in the initial version, for the record, but defining PRIMARY KEY (( [truncate(a),b] a, b), ...) would achieve the same goal.

          Permitting per-table IPartitioner declarations also seems like a good thing to support, but seems a different goal to me; that's an even lower level decision, and really all you want is hashed/partitioned. But you want those to be good at their jobs; if you screw that up, C* may behave unexpectedly.

          Show
          benedict Benedict added a comment - invalidate less documentation/existing assumptions But we wont invalidate them: it will still be true of the partition key; the routing key would always be a subset of the partition key, so the statements still hold true. The difference is that the partition key distributes the data both within and without the node, whereas the routing key only without. So it's a refinement rather than a rewrite/invalidation. Besides, that's really only one of my point. There are also two things that seem to be conflated in your proposal: per table partitioners, and arbitrary functions as partitioners. The latter is more problematic than the former, since we need to know certain things about the token distribution, such as order preservation, midpoint calculation, random token creation; even ring description is apparently specialized (perhaps this can be abstracted, not sure). However we can deliver a lot of the functionality you suggest with just arbitrary function application to the fields in the partition key when defining the routing key. I don't think this should be in the initial version, for the record, but defining PRIMARY KEY (( [truncate(a),b] a, b), ...) would achieve the same goal. Permitting per-table IPartitioner declarations also seems like a good thing to support, but seems a different goal to me; that's an even lower level decision, and really all you want is hashed/partitioned. But you want those to be good at their jobs; if you screw that up, C* may behave unexpectedly.
          Hide
          slebresne Sylvain Lebresne added a comment -

          My point is that from a data modelling perspective, being able to define the values on which you distribute is the concept you care about.

          Then we agree. But my problem is that it is exactly what the partition key is about, it's his purpose, how we explain and define it. Changing that purpose now is confusing (and if that's not the purpose of the partition key anymore, I'm not even sure what purpose it actually has, how you define it simply).

          Which is why I'm convinced we'll create less confusion and invalidate less documentation/existing assumptions by simply adding an option to define the token function. In that case, the fundamental concept stay the same and the partition key still define the values used for distribution. But the exact way they are used, which already depend on the partitioner today, gain some more flexibility as it can be user defined. The fact that you can write functions that use only some of those value becomes an implementation details, the "concept" of the partition key is preserved. I don't think changing the meaning of fundamental concepts, nor multiplying them, is a good idea.

          Besides, that's really only one of my point. We have had many time people wanting to do fancy things with the partitioner but so far the fact that the partitioner is cluster wide, and that making it per-table is pretty annoying has limited what can be done. The use case of the description is really just one special case. Assuming that it's the only smart thing we can do when it comes from computing the token from the partition key feels a bit short sided to me. It's an advanced feature for power users anyway, so lets at least make it powerful.

          Show
          slebresne Sylvain Lebresne added a comment - My point is that from a data modelling perspective, being able to define the values on which you distribute is the concept you care about. Then we agree. But my problem is that it is exactly what the partition key is about, it's his purpose, how we explain and define it. Changing that purpose now is confusing (and if that's not the purpose of the partition key anymore, I'm not even sure what purpose it actually has, how you define it simply). Which is why I'm convinced we'll create less confusion and invalidate less documentation/existing assumptions by simply adding an option to define the token function. In that case, the fundamental concept stay the same and the partition key still define the values used for distribution. But the exact way they are used, which already depend on the partitioner today, gain some more flexibility as it can be user defined. The fact that you can write functions that use only some of those value becomes an implementation details, the "concept" of the partition key is preserved. I don't think changing the meaning of fundamental concepts, nor multiplying them, is a good idea. Besides, that's really only one of my point. We have had many time people wanting to do fancy things with the partitioner but so far the fact that the partitioner is cluster wide, and that making it per-table is pretty annoying has limited what can be done. The use case of the description is really just one special case. Assuming that it's the only smart thing we can do when it comes from computing the token from the partition key feels a bit short sided to me. It's an advanced feature for power users anyway, so lets at least make it powerful.
          Hide
          benedict Benedict added a comment -

          The token is an "implementation detail" for the concept of routing, or fair distribution. Perhaps we have different definitions of implementation detail, but I stand by it under my nomenclature, and the presence of a token function doesn't really change that.

          My point is that from a data modelling perspective, being able to define the values on which you distribute is the concept you care about. The token that is ultimately used to deliver that is not important for you when modelling your system.

          Show
          benedict Benedict added a comment - The token is an "implementation detail" for the concept of routing, or fair distribution. Perhaps we have different definitions of implementation detail, but I stand by it under my nomenclature, and the presence of a token function doesn't really change that. My point is that from a data modelling perspective, being able to define the values on which you distribute is the concept you care about. The token that is ultimately used to deliver that is not important for you when modelling your system.
          Hide
          slebresne Sylvain Lebresne added a comment -

          What I'm talking about is basically the idea of CASSANDRA-5054. Or to put it another way, we could use a function like:

          CREATE FUNCTION myTokenFct(a int, b int) RETURNS bigint AS 
          $$
              long high = murmur3(a);
              long low = murmur3(b);
              return (high & 0xFFFFFFFF00000000) | (low & 0x00000000FFFFFFFF);
          $$;
          

          The goal being to make it likely that partitions with the same value for a are on a small amount of nodes but without forcing everything on the same node (the latter having a fair amount of foot-shooting potential). But that's really just an example. You could imagine to actually have a specific table that is "ordered" (in a predictable way) without having to use ByteOrderPartitioner for the whole cluster:

          CREATE FUNCTION myOrderedTokenFct(a bigint) RETURNS bigint AS 'return a';
          CREATE TABLE t (
             a int PRIMARY KEY,
             b text,
             c text
          ) with tokenizer=myOrderedTokenFct;
          

          Basically, this gets you very close to a per-table partitioner. The actual partitioner would just define the "domain" of the tokens and how they sort, but the actual computation would be per-table. And this for very, very little change to the syntax and barely more complexity code-wise than the "routing key" idea.

          Of course, this will be an advanced feature that people should use at their own risk. But that's true of the "routing key" idea too: we'd better label it as an advanced feature or I'm certain people will misuse it and shoot themselves in the foot more often than not. This is also why I'm not too worried about the drivers parts: it's simple to say that if you use a custom token function, which will be rare in the first place, then you have to provide it to the driver too to get token awareness (which is not saying that this isn't a small downside, but it's a very small one in practice and given the context).

          Perhaps more importantly, I think the function idea is conceptually simpler than the routing key idea. All that you basically have to say is that we allow you to define the token function on a per-table basis, the exact same function that already exists and can be used in SELECT.

          While the routing key concept (or whatever name we would pick) is imo more confusing. You have to explain that on top of the primary key having a subpart that is the partition key, you also have a subpart of the latter which is now the routing key. And how do you define what the partition key is now in simple terms? Well, I don't know, because once you have a routing key that is different from the partition key, the partition key start to be kind of an implementation detail. It's the "thing" that don't really determine where the row is distributed, but is not part of the clustering so you can't query it like a clustering column because ... because?

          Honestly, allowing to provide custom token function per table is 1) more powerful and 2) imo way more easy to explain conceptually and this without fuzzing existing concept. So I'm a -1 on the routing key concept unless I'm proved that the custom token function idea doesn't work, is substantially more complex to implement or has fundamental flaws I have missed. I would hate to add the routing key idea to realize that some other user has a clever "routing" idea that is just not handled by the routing key (and having to add some new custom concept).

          the distinct concept of "token" (which is more an implementation detail, IMO)

          Your opinion are your own, but the "token" is most definitively not an implementation detail since 1) we have a token function in CQL to compute it and 2) we reference it all the time in the documentation, have scores of options that mention it, it's exposed by drivers, etc... Actually, the fact that we would use the token concept rather than adding a new custom one is part of why I'm convinced it's conceptually simpler: everyone that knows Cassandra knows of tokens.

          Show
          slebresne Sylvain Lebresne added a comment - What I'm talking about is basically the idea of CASSANDRA-5054 . Or to put it another way, we could use a function like: CREATE FUNCTION myTokenFct(a int, b int) RETURNS bigint AS $$ long high = murmur3(a); long low = murmur3(b); return (high & 0xFFFFFFFF00000000) | (low & 0x00000000FFFFFFFF); $$; The goal being to make it likely that partitions with the same value for a are on a small amount of nodes but without forcing everything on the same node (the latter having a fair amount of foot-shooting potential). But that's really just an example. You could imagine to actually have a specific table that is "ordered" (in a predictable way) without having to use ByteOrderPartitioner for the whole cluster: CREATE FUNCTION myOrderedTokenFct(a bigint) RETURNS bigint AS 'return a'; CREATE TABLE t ( a int PRIMARY KEY, b text, c text ) with tokenizer=myOrderedTokenFct; Basically, this gets you very close to a per-table partitioner. The actual partitioner would just define the "domain" of the tokens and how they sort, but the actual computation would be per-table. And this for very, very little change to the syntax and barely more complexity code-wise than the "routing key" idea. Of course, this will be an advanced feature that people should use at their own risk. But that's true of the "routing key" idea too: we'd better label it as an advanced feature or I'm certain people will misuse it and shoot themselves in the foot more often than not. This is also why I'm not too worried about the drivers parts: it's simple to say that if you use a custom token function, which will be rare in the first place, then you have to provide it to the driver too to get token awareness (which is not saying that this isn't a small downside, but it's a very small one in practice and given the context). Perhaps more importantly, I think the function idea is conceptually simpler than the routing key idea. All that you basically have to say is that we allow you to define the token function on a per-table basis, the exact same function that already exists and can be used in SELECT . While the routing key concept (or whatever name we would pick) is imo more confusing. You have to explain that on top of the primary key having a subpart that is the partition key , you also have a subpart of the latter which is now the routing key . And how do you define what the partition key is now in simple terms? Well, I don't know, because once you have a routing key that is different from the partition key, the partition key start to be kind of an implementation detail. It's the "thing" that don't really determine where the row is distributed, but is not part of the clustering so you can't query it like a clustering column because ... because? Honestly, allowing to provide custom token function per table is 1) more powerful and 2) imo way more easy to explain conceptually and this without fuzzing existing concept. So I'm a -1 on the routing key concept unless I'm proved that the custom token function idea doesn't work, is substantially more complex to implement or has fundamental flaws I have missed. I would hate to add the routing key idea to realize that some other user has a clever "routing" idea that is just not handled by the routing key (and having to add some new custom concept). the distinct concept of "token" (which is more an implementation detail, IMO) Your opinion are your own, but the "token" is most definitively not an implementation detail since 1) we have a token function in CQL to compute it and 2) we reference it all the time in the documentation, have scores of options that mention it, it's exposed by drivers, etc... Actually, the fact that we would use the token concept rather than adding a new custom one is part of why I'm convinced it's conceptually simpler: everyone that knows Cassandra knows of tokens.
          Hide
          iamaleksey Aleksey Yeschenko added a comment -

          I also want to add that if we did choose this way (routing key as part of the partition key), I'd vote for DESCRIBE not indicating the routing part if it exactly matches the whole partition key.

          Most users won't be confused and won't need to know about the distinction unless they explicitly use the functionality. It's okay to hide it, it being a relatively advanced opt-in feature.

          Show
          iamaleksey Aleksey Yeschenko added a comment - I also want to add that if we did choose this way (routing key as part of the partition key), I'd vote for DESCRIBE not indicating the routing part if it exactly matches the whole partition key. Most users won't be confused and won't need to know about the distinction unless they explicitly use the functionality. It's okay to hide it, it being a relatively advanced opt-in feature.
          Hide
          benedict Benedict added a comment -

          Personally I think it is clearer having a "routing key" as a part the primary key than having a special tokenizer function. It's also syntactically cleaner. Since the user understands the indirection of clustering versus partition key, it isn't a tall order for them to understand a routing key, and it fits more neatly into a mental model than the distinct concept of "token" (which is more an implementation detail, IMO). I agree it is marginally less general, but it's not mutually exclusive. It is possible for us in future to support function application to fabricate a "column" inside the routing key declaration only.

          Show
          benedict Benedict added a comment - Personally I think it is clearer having a "routing key" as a part the primary key than having a special tokenizer function. It's also syntactically cleaner. Since the user understands the indirection of clustering versus partition key, it isn't a tall order for them to understand a routing key, and it fits more neatly into a mental model than the distinct concept of "token" (which is more an implementation detail, IMO). I agree it is marginally less general, but it's not mutually exclusive. It is possible for us in future to support function application to fabricate a "column" inside the routing key declaration only.
          Hide
          iamaleksey Aleksey Yeschenko added a comment -

          Except that it's not all the same result that I described.

          Can you give me an example then? Ideally something that the driver would still be able to understand.

          Show
          iamaleksey Aleksey Yeschenko added a comment - Except that it's not all the same result that I described. Can you give me an example then? Ideally something that the driver would still be able to understand.
          Hide
          slebresne Sylvain Lebresne added a comment -

          so they can reorder/split them as necessary and get the same result

          Except that it's not all the same result that I described.

          Show
          slebresne Sylvain Lebresne added a comment - so they can reorder/split them as necessary and get the same result Except that it's not all the same result that I described.
          Hide
          iamaleksey Aleksey Yeschenko added a comment -

          You'd be able to use more than one component of the partition key. Using the originally proposed syntax (strictly as an example) you could have PRIMARY KEY (([a, b, c], d), e, f). Ultimately, for non-routing purposes, the order of the columns in the partition key doesn't matter at all, and the use has full control, so they can reorder/split them as necessary and get the same result.

          Show
          iamaleksey Aleksey Yeschenko added a comment - You'd be able to use more than one component of the partition key. Using the originally proposed syntax (strictly as an example) you could have PRIMARY KEY (( [a, b, c] , d), e, f) . Ultimately, for non-routing purposes, the order of the columns in the partition key doesn't matter at all, and the use has full control, so they can reorder/split them as necessary and get the same result.
          Hide
          slebresne Sylvain Lebresne added a comment - - edited

          I have an equally strong preference to not overcomplicate and overgeneralise this

          Well, I disagree that it's over-generalization, it's just generalization, and generalization don't always mean more complex. In fact, it's imo simpler to use functions than to come up with a new custom concept. Perhaps more importantly, I think that something potentially more useful than just using one component of the partition key would be to use both component but only use the first one for first half of the token and the 2nd one for the 2nd half. The result being that partitions having the same first component would be on the same replica or some small number of replicas, but with still some scaling properties if you have very man partition having the same first component.

          Show
          slebresne Sylvain Lebresne added a comment - - edited I have an equally strong preference to not overcomplicate and overgeneralise this Well, I disagree that it's over -generalization, it's just generalization, and generalization don't always mean more complex. In fact, it's imo simpler to use functions than to come up with a new custom concept. Perhaps more importantly, I think that something potentially more useful than just using one component of the partition key would be to use both component but only use the first one for first half of the token and the 2nd one for the 2nd half. The result being that partitions having the same first component would be on the same replica or some small number of replicas, but with still some scaling properties if you have very man partition having the same first component.
          Hide
          iamaleksey Aleksey Yeschenko added a comment -

          I have an equally strong preference to not overcomplicate and overgeneralise this, and just dedicate part of the partition key to routing, not use functions.

          Don't have to call it a 'routing key', and I'm open to other syntax suggestions though.

          Show
          iamaleksey Aleksey Yeschenko added a comment - I have an equally strong preference to not overcomplicate and overgeneralise this, and just dedicate part of the partition key to routing, not use functions. Don't have to call it a 'routing key', and I'm open to other syntax suggestions though.
          Hide
          snazy Robert Stupp added a comment -

          Just want to prevent that drivers have to implement the whole UDF exec implementation (which could be difficult for non-Java drivers ).

          Drivers could possibly accept ”native” functions from the client code to calculate the routing-key if they really need to optimize for token-aware routing.

          Show
          snazy Robert Stupp added a comment - Just want to prevent that drivers have to implement the whole UDF exec implementation (which could be difficult for non-Java drivers ). Drivers could possibly accept ”native” functions from the client code to calculate the routing-key if they really need to optimize for token-aware routing.
          Hide
          slebresne Sylvain Lebresne added a comment -

          Not automagically, but it's easy enough to make driver accept custom functions for token-aware routing. And I'm fine provided a couple native function for the most common case (like the "use only the ith component of the partition key" of the description), which drivers could recognize automagically if they want to. That would still leave the ability to do more complex stuffs.

          Show
          slebresne Sylvain Lebresne added a comment - Not automagically, but it's easy enough to make driver accept custom functions for token-aware routing. And I'm fine provided a couple native function for the most common case (like the "use only the ith component of the partition key" of the description), which drivers could recognize automagically if they want to. That would still leave the ability to do more complex stuffs.
          Hide
          snazy Robert Stupp added a comment -

          Using UDFs for the routing-key looks nice. But I doubt that drivers would be able to compute the routing-key for token-aware routing.

          Show
          snazy Robert Stupp added a comment - Using UDFs for the routing-key looks nice. But I doubt that drivers would be able to compute the routing-key for token-aware routing.
          Hide
          slebresne Sylvain Lebresne added a comment -

          If we do this, I have a strong preference for exposing it as a way to define a custom function for computing the token. So the example above would be written something like:

          CREATE FUNCTION myCustomHash(a int, b int) RETURNS bigint AS 'return murmur3(a)';
          
          CREATE TABLE foo (
              a int,
              b int,
              c int,
              d int,
              PRIMARY KEY ((a, b), c)
          ) WITH tokenizer=myCustomHash;
          

          That's imo more generic and I don't like adding a notion of "routing key" when we already have "primary key" and "partition key" which is enough "key" (and internally the "routing key" is really just the token, so no point in having a new notion).

          Show
          slebresne Sylvain Lebresne added a comment - If we do this, I have a strong preference for exposing it as a way to define a custom function for computing the token. So the example above would be written something like: CREATE FUNCTION myCustomHash(a int, b int) RETURNS bigint AS 'return murmur3(a)'; CREATE TABLE foo ( a int, b int, c int, d int, PRIMARY KEY ((a, b), c) ) WITH tokenizer=myCustomHash; That's imo more generic and I don't like adding a notion of "routing key" when we already have "primary key" and "partition key" which is enough "key" (and internally the "routing key" is really just the token, so no point in having a new notion).
          Hide
          iamaleksey Aleksey Yeschenko added a comment -

          Additionally, when/if we have CASSANDRA-8857, we'd be able to meaningfully batch partition lookups to different tables so long as the routing key is the same, in a single roundtrip, relying on their co-locality.

          Show
          iamaleksey Aleksey Yeschenko added a comment - Additionally, when/if we have CASSANDRA-8857 , we'd be able to meaningfully batch partition lookups to different tables so long as the routing key is the same, in a single roundtrip, relying on their co-locality.
          Hide
          iamaleksey Aleksey Yeschenko added a comment -

          I've got a couple more use cases for the feature.

          If we implement this, we'll start grouping Mutation objects by

          {keyspace, routing key}

          tuples instead of

          {keyspace, partition key}

          tuples, as we do now. This means that for tables that share the same routing key, but different remaining partition keys, we'd now be able to put them in the same Mutation, and add both updates atomically to the commitlog.

          This would allow us to get batchlog functionality basically for free for the updates that share the same routing key, be it the same table or several different ones.

          Show
          iamaleksey Aleksey Yeschenko added a comment - I've got a couple more use cases for the feature. If we implement this, we'll start grouping Mutation objects by {keyspace, routing key} tuples instead of {keyspace, partition key} tuples, as we do now. This means that for tables that share the same routing key, but different remaining partition keys, we'd now be able to put them in the same Mutation, and add both updates atomically to the commitlog. This would allow us to get batchlog functionality basically for free for the updates that share the same routing key, be it the same table or several different ones.

            People

            • Assignee:
              Unassigned
              Reporter:
              mbroecheler Matthias Broecheler
            • Votes:
              2 Vote for this issue
              Watchers:
              25 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development