Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-19598

advanced.resolve-contact-points: unresolved hostname being clobbered during reconnection

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Triage Needed
    • Normal
    • Resolution: Unresolved
    • None
    • Client/java-driver
    • None
    • All
    • None

    Description

      Hello, this is a bug ticket for 4.18.0 of the Java driver.

       

      I am running in an environment where I have 3 Cassandra nodes. We have a use case to redeploy the cluster from the ground up at midnight every day. This means that all 3 nodes become unavailable for a short period of time and 3 new nodes with 3 new ip addresses get spun up and placed behind the contact point hostname. If you set advanced.resolve-contact-points to FALSE, the java driver should re-resolve the hostname for every new connection to that node. This occurs prior to and for the first redeployment, but the unresolved hostname is clobbered during the reconnection process and replaced with a resolved IP address, making additional redeployments fruitless. We provide a singular hostname as a contact point.

       

      In our case, what is happening is that all 3 nodes become unavailable while our CICD process is destroying the existing cluster and replacing it with a new one. During the window of unavailability, the Java driver attempts to reconnect to each node, two of which internally (internal to the driver) have resolved IP addresses and one of which retains the unresolved hostname. Here is a screenshot that captures the internal state of the 3 nodes within `PoolManager` prior to the finished redeployment of the cluster. Note that there are 2 resolved IP addresses and 1 unresolved hostname.

      This 2:1 ratio of resolved IP:unresolved hostname is the correct internal state for a 3 node cluster when `advanced.resolve-contact-points` is set to `FALSE`.

      Eventually, the hostname points to one of the 3 new valid nodes, and the java driver reconnects and discovers the new peers. However, as part of this reconnection process, the internal Node that held the unresolved hostname is now overwritten with a Node that has the resolved IP address:

      Note that we no longer have 2 resolved IP addresses and 1 unresolved hostname; rather, we have 3 resolved IP addresses, which is an incorrect internal state when `advanced.resolve-contact-points` is set to `FALSE`. One of the nodes should have retained the unresolved hostname.

      At this stage, the Java driver no longer queries the hostname for new connections, and further redeployments of ours result in failure because the hostname is no longer amongst the list of nodes that are queried for reconnection. This causes us to need to restart the application. 

      Attachments

        1. image-2024-04-29-20-13-56-161.png
          166 kB
          Andrew Orlowski
        2. image-2024-04-29-22-57-26-910.png
          46 kB
          Andrew Orlowski

        Activity

          People

            Unassigned Unassigned
            shot_up Andrew Orlowski
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: