Cassandra
  1. Cassandra
  2. CASSANDRA-3881

reduce computational complexity of processing topology changes

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Fix Version/s: 1.2.0 beta 1
    • Component/s: Core
    • Labels:

      Description

      This constitutes follow-up work from CASSANDRA-3831 where a partial improvement was committed, but the fundamental issue was not fixed. The maximum "practical" cluster size was significantly improved, but further work is expected to be necessary as cluster sizes grow.

      Edit0: Appended patch information.

      Patches

      Compare Raw diff Description
      00_snitch_topology 00_snitch_topology.patch Adds some functionality to TokenMetadata to track which endpoints and racks exist in a DC.
      01_calc_natural_endpoints 01_calc_natural_endpoints.patch Rewritten O(logN) implementation of calculateNaturalEndpoints using the topology information from the tokenMetadata.

      Note: These are branches managed with TopGit. If you are applying the patch output manually, you will either need to filter the TopGit metadata files (i.e. wget -O - <url> | filterdiff -x*.topdeps -x*.topmsg | patch -p1), or remove them afterward (rm .topmsg .topdeps).

        Issue Links

          Activity

          Peter Schuller created issue -
          Peter Schuller made changes -
          Field Original Value New Value
          Link This issue relates to CASSANDRA-3831 [ CASSANDRA-3831 ]
          Hide
          Peter Schuller added a comment -

          Even with post-CASSANDRA-3831 code, the need here is great. When bootstrapping a new node into a pre-existing cluster of 200+ nodes, the time it takes being CPU bound in gossip stage on the node being bootstrapped, is so large that we're close to seeing it initiate the bootstrap streaming process before being done CPU spinning i gossip stage.

          We're increasing the bootstrap wait delay for now to work around this.

          Show
          Peter Schuller added a comment - Even with post- CASSANDRA-3831 code, the need here is great. When bootstrapping a new node into a pre-existing cluster of 200+ nodes, the time it takes being CPU bound in gossip stage on the node being bootstrapped, is so large that we're close to seeing it initiate the bootstrap streaming process before being done CPU spinning i gossip stage. We're increasing the bootstrap wait delay for now to work around this.
          Hide
          Peter Schuller added a comment -

          Another work-around that is still hacky is to wait for the gossip stage pending to also be 0, after waiting for the ring delay, before proceeding with bootstrap.

          Show
          Peter Schuller added a comment - Another work-around that is still hacky is to wait for the gossip stage pending to also be 0, after waiting for the ring delay, before proceeding with bootstrap.
          Hide
          Peter Schuller added a comment -

          Apologies for spamming, but I want to make clear: This applies when bootstrapping nodes into a ring with at least one other node already bootstrapping (otherwise calculatePendingRanges won't get called repeatedly as the ring is populated in the storage service). The greater the number of bootstrapping nodes, the greater the probability of pending range calculations having to be done more often (it is determined by how many nodes get populated into the storage service' notion of the ring before the first bootstrapping node is seen), and the greater the probability of initiating bootstrap based on an incomplete ring view.

          Show
          Peter Schuller added a comment - Apologies for spamming, but I want to make clear: This applies when bootstrapping nodes into a ring with at least one other node already bootstrapping (otherwise calculatePendingRanges won't get called repeatedly as the ring is populated in the storage service). The greater the number of bootstrapping nodes, the greater the probability of pending range calculations having to be done more often (it is determined by how many nodes get populated into the storage service' notion of the ring before the first bootstrapping node is seen), and the greater the probability of initiating bootstrap based on an incomplete ring view.
          Sam Overton made changes -
          Assignee Peter Schuller [ scode ] Sam Overton [ soverton ]
          Hide
          Sam Overton added a comment -

          First the important bit: With these patches, StorageService.calculatePendingRanges is almost three orders of magnitude faster when calculating two nodes bootstrapping into a cluster with 2048 nodes (22ms vs 14.6sec). See the graph here. This was tested with 1 DC and 1 rack with RF=2.

          The problem lies in NetworkTopologyStrategy.calculateNaturalEndpoints. The main problems with the existing implementation are:

          1. for each datacentre:
          2. it iterates through all the tokens in the ring at least once
          3. then does an NlogN sort of those tokens
          4. then if number of racks < RF it will iterate through all tokens again because it can't tell if it has exhausted the racks in that DC
          5. then if number of hosts in that DC < RF it will iterate through all tokens again, otherwise it will iterate through until it has RF hosts in that DC

          so it's doing O(DC * (N + NlogN + N + N)) operations just to work out the endpoints for a single token. StorageService.calculatePendingRanges then puts this inside other loops (such as AbstractReplicationStrategy.getAddressRanges) which makes it at least O(N^2*logN).

          These patches fix (1) by iterating through the tokens only once, and processing all DCs simultaneously.

          (2,3&5) relate to knowing which endpoints exist in a given DC, (4) relates to knowing which racks appear in a DC, so the first patch adds this knowledge to the snitch. The second patch makes use of this knowledge to simplify calculateNaturalEndpoints.

          branch     description
          p/3881/00_snitch_topology compare raw diff Adds some functionality to AbstractEndpointSnitch to track which endpoints and racks exist in a DC (allows for fixing of 2-5).
          p/3881/01_calc_natural_endpoints compare raw diff Rewritten O(logN) implementation of calculateNaturalEndpoints using the topology information from the snitch.

          Note: These branches are managed with TopGit. If you are applying the patch output manually, you will either need to filter the TopGit metadata files ( wget -O - <url> | filterdiff -x*.topdeps -x*.topmsg | patch -p1 ), or remove them afterward ( rm .topmsg .topdeps ).

          Show
          Sam Overton added a comment - First the important bit: With these patches, StorageService.calculatePendingRanges is almost three orders of magnitude faster when calculating two nodes bootstrapping into a cluster with 2048 nodes (22ms vs 14.6sec). See the graph here . This was tested with 1 DC and 1 rack with RF=2. The problem lies in NetworkTopologyStrategy.calculateNaturalEndpoints. The main problems with the existing implementation are: 1. for each datacentre: 2. it iterates through all the tokens in the ring at least once 3. then does an NlogN sort of those tokens 4. then if number of racks < RF it will iterate through all tokens again because it can't tell if it has exhausted the racks in that DC 5. then if number of hosts in that DC < RF it will iterate through all tokens again, otherwise it will iterate through until it has RF hosts in that DC so it's doing O(DC * (N + NlogN + N + N)) operations just to work out the endpoints for a single token. StorageService.calculatePendingRanges then puts this inside other loops (such as AbstractReplicationStrategy.getAddressRanges) which makes it at least O(N^2*logN). These patches fix (1) by iterating through the tokens only once, and processing all DCs simultaneously. (2,3&5) relate to knowing which endpoints exist in a given DC, (4) relates to knowing which racks appear in a DC, so the first patch adds this knowledge to the snitch. The second patch makes use of this knowledge to simplify calculateNaturalEndpoints. branch     description p/3881/00_snitch_topology compare raw diff Adds some functionality to AbstractEndpointSnitch to track which endpoints and racks exist in a DC (allows for fixing of 2-5). p/3881/01_calc_natural_endpoints compare raw diff Rewritten O(logN) implementation of calculateNaturalEndpoints using the topology information from the snitch. Note: These branches are managed with TopGit . If you are applying the patch output manually, you will either need to filter the TopGit metadata files ( wget -O - <url> | filterdiff -x*.topdeps -x*.topmsg | patch -p1 ), or remove them afterward ( rm .topmsg .topdeps ).
          Sam Overton made changes -
          Link This issue blocks CASSANDRA-4119 [ CASSANDRA-4119 ]
          Hide
          Jonathan Ellis added a comment -

          Peter, are you available to review?

          Show
          Jonathan Ellis added a comment - Peter, are you available to review?
          Jonathan Ellis made changes -
          Reviewer scode
          Hide
          Sam Overton added a comment -

          Just added an additional test to NetworkTopologyStrategyTest.

          The above patches will be up to date, or if you already downloaded the diff then here is the incremental patch (raw diff)

          Show
          Sam Overton added a comment - Just added an additional test to NetworkTopologyStrategyTest. The above patches will be up to date, or if you already downloaded the diff then here is the incremental patch ( raw diff )
          Eric Evans made changes -
          Labels vnodes
          Description This constitutes follow-up work from CASSANDRA-3831 where a partial improvement was committed, but the fundamental issue was not fixed. The maximum "practical" cluster size was significantly improved, but further work is expected to be necessary as cluster sizes grow. This constitutes follow-up work from CASSANDRA-3831 where a partial improvement was committed, but the fundamental issue was not fixed. The maximum "practical" cluster size was significantly improved, but further work is expected to be necessary as cluster sizes grow.

          _Edit0: Appended patch information._

          h3. Patches
          ||Compare||Raw diff||Description||
          |[00_snitch_topology|https://github.com/acunu/cassandra/compare/refs/top-bases/p/3881/00_snitch_topology...p/3881/00_snitch_topology]|[00_snitch_topology.patch|https://github.com/acunu/cassandra/compare/refs/top-bases/p/3881/00_snitch_topology...p/3881/00_snitch_topology.diff]|Adds some functionality to AbstractEndpointSnitch to track which endpoints and racks exist in a DC.|
          |[01_calc_natural_endpoints|https://github.com/acunu/cassandra/compare/refs/top-bases/p/3881/01_calc_natural_endpoints...p/3881/01_calc_natural_endpoints]|[01_calc_natural_endpoints.patch|https://github.com/acunu/cassandra/compare/refs/top-bases/p/3881/01_calc_natural_endpoints...p/3881/01_calc_natural_endpoints.diff]|Rewritten O(logN) implementation of calculateNaturalEndpoints using the topology information from the snitch.|

          ----

          _Note: These are branches managed with TopGit. If you are applying the patch output manually, you will either need to filter the TopGit metadata files (i.e. {{wget -O - <url> | filterdiff -x*.topdeps -x*.topmsg | patch -p1}}), or remove them afterward ({{rm .topmsg .topdeps}})._
          Hide
          Sam Overton added a comment - - edited

          AbstractEndpointSnitch should handle every state change, not just onChange and onRemove

          incremental patch (raw diff)

          Show
          Sam Overton added a comment - - edited AbstractEndpointSnitch should handle every state change, not just onChange and onRemove incremental patch ( raw diff )
          Hide
          Sam Overton added a comment -

          Some remaining issues:
          1. Handling changes to dc/rack assignment. Currently the topology maps would get out of sync if any endpoint changes dc/rack. This will require Ec2Snitch and PropertyFileSnitch to notify the superclass when they detect a topology change.
          2. Delegate getTopology in DynamicSnitch to the subsnitch.

          Show
          Sam Overton added a comment - Some remaining issues: 1. Handling changes to dc/rack assignment. Currently the topology maps would get out of sync if any endpoint changes dc/rack. This will require Ec2Snitch and PropertyFileSnitch to notify the superclass when they detect a topology change. 2. Delegate getTopology in DynamicSnitch to the subsnitch.
          Hide
          Peter Schuller added a comment -

          Jonathan, I'll try to do so within the next few days.

          Show
          Peter Schuller added a comment - Jonathan, I'll try to do so within the next few days.
          Hide
          Sam Overton added a comment -

          The original approach was not quite there. The snitch was tracking the topology of nodes in NORMAL state for the benefit of NTS.calculateNaturalEndpoints, but calculateNaturalEndpoints is called with modified TokenMetadata (eg, with leaving nodes removed, or a bootstrapped node added or some other modification) to calculate ranges for some future state of the ring, not the current state as tracked by the snitch.

          The correct solution is to have TokenMetadata track the topology of the nodes which it considers to be part of the ring, so that when a tokenMetadata is cloned and modified it also updates its view of the topology. This is also much simpler and cleaner.

          Patches above are updated.

          Show
          Sam Overton added a comment - The original approach was not quite there. The snitch was tracking the topology of nodes in NORMAL state for the benefit of NTS.calculateNaturalEndpoints, but calculateNaturalEndpoints is called with modified TokenMetadata (eg, with leaving nodes removed, or a bootstrapped node added or some other modification) to calculate ranges for some future state of the ring, not the current state as tracked by the snitch. The correct solution is to have TokenMetadata track the topology of the nodes which it considers to be part of the ring, so that when a tokenMetadata is cloned and modified it also updates its view of the topology. This is also much simpler and cleaner. Patches above are updated.
          Sam Overton made changes -
          Description This constitutes follow-up work from CASSANDRA-3831 where a partial improvement was committed, but the fundamental issue was not fixed. The maximum "practical" cluster size was significantly improved, but further work is expected to be necessary as cluster sizes grow.

          _Edit0: Appended patch information._

          h3. Patches
          ||Compare||Raw diff||Description||
          |[00_snitch_topology|https://github.com/acunu/cassandra/compare/refs/top-bases/p/3881/00_snitch_topology...p/3881/00_snitch_topology]|[00_snitch_topology.patch|https://github.com/acunu/cassandra/compare/refs/top-bases/p/3881/00_snitch_topology...p/3881/00_snitch_topology.diff]|Adds some functionality to AbstractEndpointSnitch to track which endpoints and racks exist in a DC.|
          |[01_calc_natural_endpoints|https://github.com/acunu/cassandra/compare/refs/top-bases/p/3881/01_calc_natural_endpoints...p/3881/01_calc_natural_endpoints]|[01_calc_natural_endpoints.patch|https://github.com/acunu/cassandra/compare/refs/top-bases/p/3881/01_calc_natural_endpoints...p/3881/01_calc_natural_endpoints.diff]|Rewritten O(logN) implementation of calculateNaturalEndpoints using the topology information from the snitch.|

          ----

          _Note: These are branches managed with TopGit. If you are applying the patch output manually, you will either need to filter the TopGit metadata files (i.e. {{wget -O - <url> | filterdiff -x*.topdeps -x*.topmsg | patch -p1}}), or remove them afterward ({{rm .topmsg .topdeps}})._
          This constitutes follow-up work from CASSANDRA-3831 where a partial improvement was committed, but the fundamental issue was not fixed. The maximum "practical" cluster size was significantly improved, but further work is expected to be necessary as cluster sizes grow.

          _Edit0: Appended patch information._

          h3. Patches
          ||Compare||Raw diff||Description||
          |[00_snitch_topology|https://github.com/acunu/cassandra/compare/refs/top-bases/p/3881/00_snitch_topology...p/3881/00_snitch_topology]|[00_snitch_topology.patch|https://github.com/acunu/cassandra/compare/refs/top-bases/p/3881/00_snitch_topology...p/3881/00_snitch_topology.diff]|Adds some functionality to TokenMetadata to track which endpoints and racks exist in a DC.|
          |[01_calc_natural_endpoints|https://github.com/acunu/cassandra/compare/refs/top-bases/p/3881/01_calc_natural_endpoints...p/3881/01_calc_natural_endpoints]|[01_calc_natural_endpoints.patch|https://github.com/acunu/cassandra/compare/refs/top-bases/p/3881/01_calc_natural_endpoints...p/3881/01_calc_natural_endpoints.diff]|Rewritten O(logN) implementation of calculateNaturalEndpoints using the topology information from the tokenMetadata.|

          ----

          _Note: These are branches managed with TopGit. If you are applying the patch output manually, you will either need to filter the TopGit metadata files (i.e. {{wget -O - <url> | filterdiff -x*.topdeps -x*.topmsg | patch -p1}}), or remove them afterward ({{rm .topmsg .topdeps}})._
          Hide
          Jonathan Ellis added a comment - - edited

          This looks reasonable. I have two concerns:

          • Topology syncronization: a mix between "Topology synchronizes internally" and "caller must synchronize externally" is a recipe for trouble. Maybe just synchronizing getDatacenterEndpoints/getDatacenterRacks and returning copies, would be enough. Alternatively, we could just say "you must clone TMD before calling calculateNaturalEndpoints" and possibly get rid of all the Topology synchronization (relying on TMD's on the update path)
          • I think there is a hole in the rack-handling logic in cNE: we only check skippedDcEndpoints when a new rack is found. So if there is (for instance) a single rack in a DC w/ RF=3, we'll add the first endpoint in that rack, then the rest will get added to the skipped list, but never added to replicas.

          00 nits:

          • would like to see javadoc for Topology
          • type specification is not necessary for HMM.create calls (this is why guava prefers factories to constructors)
          • I recognize that you were following precedent here, but I would prefer the param-less TMD constructor to call new TMD(HashBimap.create(), new Topology()) instead of (null, null) + special casing in the 2-param version

          01 nits:

          • @Override is redundant for calculateNaturalEndpoints since parent declares it abstract
          • some "} else" formatting issues in cNE
          • "!skippedDcEndpoints.get(dc).isEmpty()" is redundant since the empty iterator case will just be a no-op in the following loop
          • perhaps dcReplicas would be a better name than replicaEndpoints, for symmetry w/ "replicas"
          • would be cleaner to move the "replicaEndpoints.get(dc).size() >= Math.min(allEndpoints.get(dc).size(), getReplicationFactor)" check into "have we already found all replicas for this dc", instead of playing games w/ mutating replicaEndpoints as an (unimportant) optimization. (Note that "datacenters.containsKey(dc)" remains a sufficient check for "is this a dc we care about at all.")
          Show
          Jonathan Ellis added a comment - - edited This looks reasonable. I have two concerns: Topology syncronization: a mix between "Topology synchronizes internally" and "caller must synchronize externally" is a recipe for trouble. Maybe just synchronizing getDatacenterEndpoints/getDatacenterRacks and returning copies, would be enough. Alternatively, we could just say "you must clone TMD before calling calculateNaturalEndpoints" and possibly get rid of all the Topology synchronization (relying on TMD's on the update path) I think there is a hole in the rack-handling logic in cNE: we only check skippedDcEndpoints when a new rack is found. So if there is (for instance) a single rack in a DC w/ RF=3, we'll add the first endpoint in that rack, then the rest will get added to the skipped list, but never added to replicas. 00 nits: would like to see javadoc for Topology type specification is not necessary for HMM.create calls (this is why guava prefers factories to constructors) I recognize that you were following precedent here, but I would prefer the param-less TMD constructor to call new TMD(HashBimap.create(), new Topology()) instead of (null, null) + special casing in the 2-param version 01 nits: @Override is redundant for calculateNaturalEndpoints since parent declares it abstract some "} else" formatting issues in cNE "!skippedDcEndpoints.get(dc).isEmpty()" is redundant since the empty iterator case will just be a no-op in the following loop perhaps dcReplicas would be a better name than replicaEndpoints, for symmetry w/ "replicas" would be cleaner to move the "replicaEndpoints.get(dc).size() >= Math.min(allEndpoints.get(dc).size(), getReplicationFactor)" check into "have we already found all replicas for this dc", instead of playing games w/ mutating replicaEndpoints as an (unimportant) optimization. (Note that "datacenters.containsKey(dc)" remains a sufficient check for "is this a dc we care about at all.")
          Hide
          Sam Overton added a comment -

          Thanks Jonathan. I've addressed the above in the following commits:
          7eab101 incremental patch (raw diff)
          1474cf0 incremental patch (raw diff)

          (patch links above updated)

          except for the following:

          • Topology syncronization: a mix between "Topology synchronizes internally" and "caller must synchronize externally" is a recipe for trouble. Maybe just synchronizing getDatacenterEndpoints/getDatacenterRacks and returning copies, would be enough. Alternatively, we could just say "you must clone TMD before calling calculateNaturalEndpoints" and possibly get rid of all the Topology synchronization (relying on TMD's on the update path)

          I was trying to avoid any copying, as calculateNaturalEndpoints will be called thousands of times with vnodes in some code paths. I prefer the latter solution of cloning TMD before using it in any method which will use the Topology. The only places where cloning will be necessary to avoid concurrent updates are those where StorageService.instance.tokenMetadata is used directly. I'll update the patches shortly.

          • I think there is a hole in the rack-handling logic in cNE: we only check skippedDcEndpoints when a new rack is found. So if there is (for instance) a single rack in a DC w/ RF=3, we'll add the first endpoint in that rack, then the rest will get added to the skipped list, but never added to replicas.

          I think this case is already handled: the subsequent endpoints for that duplicate rack will hit this line first:

              // can we skip checking the rack?
              if (seenRacks.get(dc).size() == racks.get(dc).keySet().size())
          

          and they get added as a replica immediately because we know we have exhausted the racks for that DC. Did I miss something?

          Show
          Sam Overton added a comment - Thanks Jonathan. I've addressed the above in the following commits: 7eab101 incremental patch ( raw diff ) 1474cf0 incremental patch ( raw diff ) (patch links above updated) except for the following: Topology syncronization: a mix between "Topology synchronizes internally" and "caller must synchronize externally" is a recipe for trouble. Maybe just synchronizing getDatacenterEndpoints/getDatacenterRacks and returning copies, would be enough. Alternatively, we could just say "you must clone TMD before calling calculateNaturalEndpoints" and possibly get rid of all the Topology synchronization (relying on TMD's on the update path) I was trying to avoid any copying, as calculateNaturalEndpoints will be called thousands of times with vnodes in some code paths. I prefer the latter solution of cloning TMD before using it in any method which will use the Topology. The only places where cloning will be necessary to avoid concurrent updates are those where StorageService.instance.tokenMetadata is used directly. I'll update the patches shortly. I think there is a hole in the rack-handling logic in cNE: we only check skippedDcEndpoints when a new rack is found. So if there is (for instance) a single rack in a DC w/ RF=3, we'll add the first endpoint in that rack, then the rest will get added to the skipped list, but never added to replicas. I think this case is already handled: the subsequent endpoints for that duplicate rack will hit this line first: // can we skip checking the rack? if (seenRacks.get(dc).size() == racks.get(dc).keySet().size()) and they get added as a replica immediately because we know we have exhausted the racks for that DC. Did I miss something?
          Hide
          Jonathan Ellis added a comment -

          The only places where cloning will be necessary to avoid concurrent updates are those where StorageService.instance.tokenMetadata is used directly. I'll update the patches shortly.

          Sounds good, I'll have a look when that hits.

          I think this case is already handled: the subsequent endpoints for that duplicate rack will hit this line first

          Ah, right. LGTM.

          Show
          Jonathan Ellis added a comment - The only places where cloning will be necessary to avoid concurrent updates are those where StorageService.instance.tokenMetadata is used directly. I'll update the patches shortly. Sounds good, I'll have a look when that hits. I think this case is already handled: the subsequent endpoints for that duplicate rack will hit this line first Ah, right. LGTM.
          Hide
          Sam Overton added a comment -

          I tried out the different approaches for removing this sychronization requirement on Topology:

          This looks much more hairy than having synchronization - the places where tokenMetadata needs to be cloned look arbitrary and it's much less obvious what the convention is for other people looking at this (eg. why do you need to clone before passing into AbstractReplicationStrategy.getAddressRanges() but not AbstractReplicationStrategy.getPendingAddressRanges() ? A: it's because getPendingAddressRanges makes its own clone to update a token).

          This is less error-prone because Topology now handles all its own synchonization. Unfortunately copying those maps adds a large overhead which removes most of the benefit of this ticket. See the updated graph of calculatePendingRanges vs. cluster size. This is not surprising because these copies require O(N) time.

          This is my preferred solution. Currently only calculateNaturalEndpoints uses the TMD.Topology, so requiring to synchronize around it seems like a reasonable solution to me. The javadoc should make it obvious to any new uses of it that they need to synchronize.

          Show
          Sam Overton added a comment - I tried out the different approaches for removing this sychronization requirement on Topology: clone TokenMetadata before passing to calculateNaturalEndpoints ( raw diff ) This looks much more hairy than having synchronization - the places where tokenMetadata needs to be cloned look arbitrary and it's much less obvious what the convention is for other people looking at this (eg. why do you need to clone before passing into AbstractReplicationStrategy.getAddressRanges() but not AbstractReplicationStrategy.getPendingAddressRanges() ? A: it's because getPendingAddressRanges makes its own clone to update a token). return copies from Topology.getDatacenterEndpoints() and Topology.getDatacenterRacks() ( raw diff ) This is less error-prone because Topology now handles all its own synchonization. Unfortunately copying those maps adds a large overhead which removes most of the benefit of this ticket. See the updated graph of calculatePendingRanges vs. cluster size . This is not surprising because these copies require O(N) time. document the synchronization requirements of Topology so that it's clear what is necessary ( raw diff ) This is my preferred solution. Currently only calculateNaturalEndpoints uses the TMD.Topology, so requiring to synchronize around it seems like a reasonable solution to me. The javadoc should make it obvious to any new uses of it that they need to synchronize.
          Hide
          David Alves added a comment -

          Hi

          small nit:
          I was using the ctor

          public TokenMetadata(BiMap<Token, InetAddress> tokenToEndpointMap)

          in non-commited code. That ctor has now become

          public TokenMetadata(BiMap<Token, InetAddress> tokenToEndpointMap, Topology topology)

          since the ctor in Topology is protected this ctor in TokenMetadata can't be called outside of protected scope from now on. I suggest either broadening the scope of the ctor in Topology or restricting the scope in TokenMetadata.

          Show
          David Alves added a comment - Hi small nit: I was using the ctor public TokenMetadata(BiMap<Token, InetAddress> tokenToEndpointMap) in non-commited code. That ctor has now become public TokenMetadata(BiMap<Token, InetAddress> tokenToEndpointMap, Topology topology) since the ctor in Topology is protected this ctor in TokenMetadata can't be called outside of protected scope from now on. I suggest either broadening the scope of the ctor in Topology or restricting the scope in TokenMetadata.
          Hide
          Sam Overton added a comment -

          Hi David, is this non-committed code that's part of another ticket?

          Show
          Sam Overton added a comment - Hi David, is this non-committed code that's part of another ticket?
          Hide
          David Alves added a comment -

          hi sam

          yes I was using that ctor to test StorageService.effectiveOwnership, included in the CASSANDRA-3047 patch.

          I worked around it since it makes sense that it makes sense that TokenMetadata receives token->endpoint mappings through

          updateNormalTokens

          , in order to build topology. The thing was that while previously the ctor

          TokenMetadata(BiMap<Token, InetAddress> tokenToEndpointMap)

          was usable, now it is not, at least not without getting topology from another TokenMetadata instance.

          Show
          David Alves added a comment - hi sam yes I was using that ctor to test StorageService.effectiveOwnership, included in the CASSANDRA-3047 patch. I worked around it since it makes sense that it makes sense that TokenMetadata receives token->endpoint mappings through updateNormalTokens , in order to build topology. The thing was that while previously the ctor TokenMetadata(BiMap<Token, InetAddress> tokenToEndpointMap) was usable, now it is not, at least not without getting topology from another TokenMetadata instance.
          Hide
          Sam Overton added a comment -

          Ok, I was going to suggest using updateNormalTokens, or you could change ctor visibility in your patch if required.

          Show
          Sam Overton added a comment - Ok, I was going to suggest using updateNormalTokens, or you could change ctor visibility in your patch if required.
          Hide
          Jonathan Ellis added a comment -

          the places where tokenMetadata needs to be cloned look arbitrary and it's much less obvious what the convention is for other people looking at this

          What if we just added an assert (assert tokenMetadata != StorageService.tokenMetadata) to enforce this requirement?

          Granted that we are choosing lesser evils here, but I like that better than trying to reason about synchronized(Topology) mixed with locks mixed with synchronized(bootstrapTokens).

          Show
          Jonathan Ellis added a comment - the places where tokenMetadata needs to be cloned look arbitrary and it's much less obvious what the convention is for other people looking at this What if we just added an assert ( assert tokenMetadata != StorageService.tokenMetadata ) to enforce this requirement? Granted that we are choosing lesser evils here, but I like that better than trying to reason about synchronized(Topology) mixed with locks mixed with synchronized(bootstrapTokens).
          Hide
          Jonathan Ellis added a comment -

          Pushed this to https://github.com/jbellis/cassandra/tree/3881-clone-tmd with some extra synchronization cleanup

          Show
          Jonathan Ellis added a comment - Pushed this to https://github.com/jbellis/cassandra/tree/3881-clone-tmd with some extra synchronization cleanup
          Hide
          Jonathan Ellis added a comment -

          More general assert instead (against TM.getTopology) pushed to https://github.com/jbellis/cassandra/tree/3881-clone-tmd-2

          Show
          Jonathan Ellis added a comment - More general assert instead (against TM.getTopology) pushed to https://github.com/jbellis/cassandra/tree/3881-clone-tmd-2
          Hide
          Sam Overton added a comment -

          Agreed that having a single lock is easier to reason about its correctness.

          There was one more extra synchronization to cleanup - pushed to https://github.com/acunu/cassandra/tree/3881-clone-tmd-3.

          I'll get these changes merged into the TopGit branches. They'll possibly need separating out for the two separate patches.

          Show
          Sam Overton added a comment - Agreed that having a single lock is easier to reason about its correctness. There was one more extra synchronization to cleanup - pushed to https://github.com/acunu/cassandra/tree/3881-clone-tmd-3 . I'll get these changes merged into the TopGit branches. They'll possibly need separating out for the two separate patches.
          Hide
          Jonathan Ellis added a comment -

          I can merge/squash/commit from here.

          Show
          Jonathan Ellis added a comment - I can merge/squash/commit from here.
          Hide
          Jonathan Ellis added a comment -

          I take it back, history is a mess w/o topgit. Will wait for your update.

          Show
          Jonathan Ellis added a comment - I take it back, history is a mess w/o topgit. Will wait for your update.
          Hide
          Sam Overton added a comment -

          It's all merged in now, so the patch links in the ticket description are up to date.

          There was one more place in some tests that TMD needed to be cloned: https://github.com/acunu/cassandra/commit/08620c55a77ba1dc257b853610386297ab0c379b

          Show
          Sam Overton added a comment - It's all merged in now, so the patch links in the ticket description are up to date. There was one more place in some tests that TMD needed to be cloned: https://github.com/acunu/cassandra/commit/08620c55a77ba1dc257b853610386297ab0c379b
          Hide
          Jonathan Ellis added a comment -

          thanks, committed!

          Show
          Jonathan Ellis added a comment - thanks, committed!
          Jonathan Ellis made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Reviewer scode jbellis
          Fix Version/s 1.2 [ 12319262 ]
          Resolution Fixed [ 1 ]
          Gavin made changes -
          Workflow no-reopen-closed, patch-avail [ 12652676 ] patch-available, re-open possible [ 12749829 ]
          Gavin made changes -
          Workflow patch-available, re-open possible [ 12749829 ] reopen-resolved, no closed status, patch-avail, testing [ 12757336 ]
          Transition Time In Source Status Execution Times Last Executer Last Execution Date
          Open Open Resolved Resolved
          145d 7h 53m 1 Jonathan Ellis 03/Jul/12 19:24

            People

            • Assignee:
              Sam Overton
              Reporter:
              Peter Schuller
              Reviewer:
              Jonathan Ellis
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development