Cassandra
  1. Cassandra
  2. CASSANDRA-1777

The describe_host API method is misleading in that it returns the interface associated with gossip traffic

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Fix Version/s: 0.8.4
    • Component/s: None
    • Labels:
      None

      Description

      If the hardware is configured to use separate interfaces for thrift and gossip, the gossip interface will be returned, given the results come out of the ReplicationStrategy eventually.

      I understand the approach, but given this is on the API, it effective worthless in situations of host auto discovery via describe_ring from a client. I actually see this as the primary use case of this method - why else would I care about the gossip iface from the client perspective? It's current form should be relegated to JMX only.

      At the same time, we should add port information as well.

      describe_splits probably has similar issues.

      I see the potential cart-before-horse issues here and that this will probably be non-trivial to fix, but I think "give me a set of all the hosts to which I can talk" is pretty important from a client perspective.

      1. 1777.txt
        6 kB
        Brandon Williams
      2. 1777-v2.txt
        5 kB
        Brandon Williams

        Issue Links

          Activity

          Hide
          Brandon Williams added a comment -

          This is actually simple enough to apply to 0.7 if we leave out the rpc port (since that would require a thrift api change.) describe_splits doesn't appear to be affected, just describe_ring.

          Show
          Brandon Williams added a comment - This is actually simple enough to apply to 0.7 if we leave out the rpc port (since that would require a thrift api change.) describe_splits doesn't appear to be affected, just describe_ring.
          Hide
          Jonathan Ellis added a comment -

          gossip is for internal cluster state, we definitely shouldn't be using it for rpc_address and rpc_port.

          90% convinced the answer is "don't use describe_ring for node discovery, that's what RRDNS is for."

          Show
          Jonathan Ellis added a comment - gossip is for internal cluster state, we definitely shouldn't be using it for rpc_address and rpc_port. 90% convinced the answer is "don't use describe_ring for node discovery, that's what RRDNS is for."
          Hide
          Brandon Williams added a comment -

          You can't build a routing-aware client from RRDNS though because you'll have no way to map the IP to the token. describe_ring really is useless if the gossip iface isn't the same as the rpc address right now (describe_ring is already using gossip for the info it returns, it's just the wrong info.)

          Show
          Brandon Williams added a comment - You can't build a routing-aware client from RRDNS though because you'll have no way to map the IP to the token. describe_ring really is useless if the gossip iface isn't the same as the rpc address right now (describe_ring is already using gossip for the info it returns, it's just the wrong info.)
          Hide
          Ryan King added a comment -

          Unless you have a dns server that can understand cassandra membership, RRDNS is actually a rough way to do this. I'd prefer to supply something for clients that works correctly.

          Show
          Ryan King added a comment - Unless you have a dns server that can understand cassandra membership, RRDNS is actually a rough way to do this. I'd prefer to supply something for clients that works correctly.
          Hide
          Eric Evans added a comment -

          I don't see how you could build a non-Java routing-aware client without replicating the partitioner and replica placement client-side. Ick.

          Show
          Eric Evans added a comment - I don't see how you could build a non-Java routing-aware client without replicating the partitioner and replica placement client-side. Ick.
          Hide
          Ryan King added a comment -

          I don't care about making it routing-aware. I just want to do discovery.

          Show
          Ryan King added a comment - I don't care about making it routing-aware. I just want to do discovery.
          Hide
          Nick Bailey added a comment -

          Besides just auto_discovery this breaks the pig storage func in contrib if you use different interfaces. That class uses describe ring to generate the input splits for map tasks to work on, which obviously doesn't work if it returns the gossip interface instead of the thrift interface.

          +1 on fixing this and backporting to 0.7

          Show
          Nick Bailey added a comment - Besides just auto_discovery this breaks the pig storage func in contrib if you use different interfaces. That class uses describe ring to generate the input splits for map tasks to work on, which obviously doesn't work if it returns the gossip interface instead of the thrift interface. +1 on fixing this and backporting to 0.7
          Hide
          Brandon Williams added a comment -

          Since the problem is rooted in CFIF, this actually runs deeper than just Pig.

          Show
          Brandon Williams added a comment - Since the problem is rooted in CFIF, this actually runs deeper than just Pig.
          Hide
          Jonathan Ellis added a comment -

          Brandon, what is the status here?

          Show
          Jonathan Ellis added a comment - Brandon, what is the status here?
          Hide
          Brandon Williams added a comment -

          describe_ring, as is, is completely useless if you don't have thrift bound to the same interface as the storage proto, so I stand by the change to advertise the thrift address instead.

          Show
          Brandon Williams added a comment - describe_ring, as is, is completely useless if you don't have thrift bound to the same interface as the storage proto, so I stand by the change to advertise the thrift address instead.
          Hide
          Jonathan Ellis added a comment -

          is that what the attached patch does?

          Show
          Jonathan Ellis added a comment - is that what the attached patch does?
          Hide
          Brandon Williams added a comment -

          That's what it did 8 months ago It gets the rpc address/port information for each machine via gossip and returns that in describe_ring.

          Show
          Brandon Williams added a comment - That's what it did 8 months ago It gets the rpc address/port information for each machine via gossip and returns that in describe_ring.
          Hide
          Jonathan Ellis added a comment -

          oh yeah. the whole "gossip is for internal cluster state, we definitely shouldn't be using it for rpc_address and rpc_port" thing I objected to originally.

          however, I don't have a better solution (asking nodes directly would mean we couldn't include nodes that are currently unreachable), so +1 I guess on the general approach.

          if we're not going to expose individual rpc_port can we just require that it be the same cluster-wide and not bother gossiping it? it's kind of a misfeature anyway to allow different ones.

          the old getRangeToEndpointMap is unused and can be removed now right?

          Committing to 0.8 is probably fine but let's leave 0.7 alone.

          Show
          Jonathan Ellis added a comment - oh yeah. the whole "gossip is for internal cluster state, we definitely shouldn't be using it for rpc_address and rpc_port" thing I objected to originally. however, I don't have a better solution (asking nodes directly would mean we couldn't include nodes that are currently unreachable), so +1 I guess on the general approach. if we're not going to expose individual rpc_port can we just require that it be the same cluster-wide and not bother gossiping it? it's kind of a misfeature anyway to allow different ones. the old getRangeToEndpointMap is unused and can be removed now right? Committing to 0.8 is probably fine but let's leave 0.7 alone.
          Hide
          Brandon Williams added a comment -

          if we're not going to expose individual rpc_port can we just require that it be the same cluster-wide and not bother gossiping it? it's kind of a misfeature anyway to allow different ones.

          I agree, I'll remove the port.

          the old getRangeToEndpointMap is unused and can be removed now right?

          I think I originally left it in because it's exposed over JMX and I didn't want to break it if anyone was using it for something.

          Show
          Brandon Williams added a comment - if we're not going to expose individual rpc_port can we just require that it be the same cluster-wide and not bother gossiping it? it's kind of a misfeature anyway to allow different ones. I agree, I'll remove the port. the old getRangeToEndpointMap is unused and can be removed now right? I think I originally left it in because it's exposed over JMX and I didn't want to break it if anyone was using it for something.
          Hide
          Brandon Williams added a comment -

          v2 only communicate the rpc address over gossip, and fixes a problem in v1 when the rpc address is left blank.

          Show
          Brandon Williams added a comment - v2 only communicate the rpc address over gossip, and fixes a problem in v1 when the rpc address is left blank.
          Hide
          Jonathan Ellis added a comment -

          +1, but let's remove getRangeToEndpointMap when you merge to trunk.

          Show
          Jonathan Ellis added a comment - +1, but let's remove getRangeToEndpointMap when you merge to trunk.
          Hide
          Brandon Williams added a comment -

          Committed.

          Show
          Brandon Williams added a comment - Committed.
          Hide
          Hudson added a comment -

          Integrated in Cassandra-0.8 #254 (See https://builds.apache.org/job/Cassandra-0.8/254/)
          describe_ring returns the interface thrift is bound to instead of the
          one the storage proto is bound to.
          Patch by brandonwilliams, reviewed by jbellis for CASSANDRA-1777

          brandonwilliams : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1153683
          Files :

          • /cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/gms/VersionedValue.java
          • /cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/thrift/CassandraServer.java
          • /cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/service/StorageService.java
          • /cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/gms/ApplicationState.java
          Show
          Hudson added a comment - Integrated in Cassandra-0.8 #254 (See https://builds.apache.org/job/Cassandra-0.8/254/ ) describe_ring returns the interface thrift is bound to instead of the one the storage proto is bound to. Patch by brandonwilliams, reviewed by jbellis for CASSANDRA-1777 brandonwilliams : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1153683 Files : /cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/gms/VersionedValue.java /cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/thrift/CassandraServer.java /cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/service/StorageService.java /cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/gms/ApplicationState.java

            People

            • Assignee:
              Brandon Williams
              Reporter:
              Nate McCall
              Reviewer:
              Jonathan Ellis
            • Votes:
              3 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - 16h
                16h
                Remaining:
                Remaining Estimate - 16h
                16h
                Logged:
                Time Spent - Not Specified
                Not Specified

                  Development