Cassandra
  1. Cassandra
  2. CASSANDRA-3114

After Choosing EC2Snitch you can't migrate off w/o a full cluster restart

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      Once you choose the Ec2Snitch the gossip messages will trigger this exception if you try to move (for example) to the property file snitch:

      ERROR [pool-2-thread-11] 2011-08-30 16:38:06,935 Cassandra.java (line 3041) Internal error processing get_slice
      java.lang.NullPointerException
      at org.apache.cassandra.locator.Ec2Snitch.getDatacenter(Ec2Snitch.java:84)
      at org.apache.cassandra.locator.DynamicEndpointSnitch.getDatacenter(DynamicEndpointSnitch.java:122)
      at org.apache.cassandra.service.DatacenterReadCallback.assureSufficientLiveNodes(DatacenterReadCallback.java:77)
      at org.apache.cassandra.service.StorageProxy.fetchRows(StorageProxy.java:516)
      at org.apache.cassandra.service.StorageProxy.read(StorageProxy.java:480)
      at org.apache.cassandra.thrift.CassandraServer.readColumnFamily(CassandraServer.java:109)
      at org.apache.cassandra.thrift.CassandraServer.getSlice(CassandraServer.java:263)
      at org.apache.cassandra.thrift.CassandraServer.multigetSliceInternal(CassandraServer.java:345)
      at org.apache.cassandra.thrift.CassandraServer.get_slice(CassandraServer.java:306)
      at org.apache.cassandra.thrift.Cassandra$Processor$get_slice.process(Cassandra.java:3033)
      at org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2889)
      at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:187)
      at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
      at java.lang.Thread.run(Thread.java:662)

        Issue Links

          Activity

          Hide
          Brandon Williams added a comment -

          I'm not sure there's a good solution here. We could make PFEPS inject the local nodes dc/rack info into gossip similar to what I suggested in CASSANDRA-1974, but you'd still have to name things with the ec2snitch conventions for things to not break, and it would be very PFEPS-specific; other snitches are out of the question.

          Ultimately I'm inclined to say you need to choose your snitch like you choose your partitioner: very carefully.

          Show
          Brandon Williams added a comment - I'm not sure there's a good solution here. We could make PFEPS inject the local nodes dc/rack info into gossip similar to what I suggested in CASSANDRA-1974 , but you'd still have to name things with the ec2snitch conventions for things to not break, and it would be very PFEPS-specific; other snitches are out of the question. Ultimately I'm inclined to say you need to choose your snitch like you choose your partitioner: very carefully.
          Hide
          Jackson Chung added a comment -

          What if do this in the Abstract?

          AbstractEndpointSnitch.java
              public void gossiperStarting()
              {
              	String dc = getDatacenter(FBUtilities.getBroadcastAddress());
              	String rack = getRack(FBUtilities.getBroadcastAddress());
              	logger.info(this.getClass().getSimpleName() +" adding ApplicationState DC=" + dc + " Rack=" + rack);
                  Gossiper.instance.addLocalApplicationState(ApplicationState.DC, StorageService.instance.valueFactory.datacenter(dc));
                  Gossiper.instance.addLocalApplicationState(ApplicationState.RACK, StorageService.instance.valueFactory.rack(rack));
              }
          
          Show
          Jackson Chung added a comment - What if do this in the Abstract? AbstractEndpointSnitch.java public void gossiperStarting() { String dc = getDatacenter(FBUtilities.getBroadcastAddress()); String rack = getRack(FBUtilities.getBroadcastAddress()); logger.info( this .getClass().getSimpleName() + " adding ApplicationState DC=" + dc + " Rack=" + rack); Gossiper.instance.addLocalApplicationState(ApplicationState.DC, StorageService.instance.valueFactory.datacenter(dc)); Gossiper.instance.addLocalApplicationState(ApplicationState.RACK, StorageService.instance.valueFactory.rack(rack)); }
          Hide
          Brandon Williams added a comment -

          I don't see how making your dc/rack names your external IP address is going to solve anything.

          Show
          Brandon Williams added a comment - I don't see how making your dc/rack names your external IP address is going to solve anything.
          Hide
          Jackson Chung added a comment -

          "but you'd still have to name things with the ec2snitch conventions for things to not break" still hold true with the above.

          Show
          Jackson Chung added a comment - "but you'd still have to name things with the ec2snitch conventions for things to not break" still hold true with the above.
          Hide
          Jackson Chung added a comment - - edited

          "I don't see how making your dc/rack names your external IP address is going to solve anything."

          well the NPE was on

          return Gossiper.instance.getEndpointStateForEndpoint(endpoint).getApplicationState(ApplicationState.DC).value;
          

          the given endpoint is not the local address; its the address from "other" nodes. For those "other" nodes, if they are not using the Ec2Snitch, which would have populated the "ApplicationState.DC" and "ApplicationState.RACK" with the values, getApplicationState(ApplicationState.DC) (and getApplicationState(ApplicationState.RACK) for that matter) is going to be return null. Hence you got a NPE from that line on .value.

          Defaulting the AbstractEndpointSnitch's gossiperStarting by populating the ApplicationState.DC,ApplicationState.RACK wll help then any snitch relying the gossip info to getDC and getRack.

          Show
          Jackson Chung added a comment - - edited "I don't see how making your dc/rack names your external IP address is going to solve anything." well the NPE was on return Gossiper.instance.getEndpointStateForEndpoint(endpoint).getApplicationState(ApplicationState.DC).value; the given endpoint is not the local address; its the address from "other" nodes. For those "other" nodes, if they are not using the Ec2Snitch, which would have populated the "ApplicationState.DC" and "ApplicationState.RACK" with the values, getApplicationState(ApplicationState.DC) (and getApplicationState(ApplicationState.RACK) for that matter) is going to be return null. Hence you got a NPE from that line on .value. Defaulting the AbstractEndpointSnitch's gossiperStarting by populating the ApplicationState.DC,ApplicationState.RACK wll help then any snitch relying the gossip info to getDC and getRack.
          Hide
          Brandon Williams added a comment -

          Defaulting the AbstractEndpointSnitch's gossiperStarting by populating the ApplicationState.DC,ApplicationState.RACK wll help then any snitch relying the gossip info to getDC and getRack.

          Yes, but setting DC to 'foo' and rack to 'bar' just creates a new DC and rack and breaks the replication policy and consistency guarantees.

          Show
          Brandon Williams added a comment - Defaulting the AbstractEndpointSnitch's gossiperStarting by populating the ApplicationState.DC,ApplicationState.RACK wll help then any snitch relying the gossip info to getDC and getRack. Yes, but setting DC to 'foo' and rack to 'bar' just creates a new DC and rack and breaks the replication policy and consistency guarantees.
          Hide
          Brandon Williams added a comment -

          Closing, see CASSANDRA-3186

          Show
          Brandon Williams added a comment - Closing, see CASSANDRA-3186

            People

            • Assignee:
              Unassigned
              Reporter:
              Benjamin Coverston
            • Votes:
              1 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development