Uploaded image for project: 'Geode'
  1. Geode
  2. GEODE-1134

Stale Cluster Configuration Service

    XMLWordPrintableJSON

Details

    Description

      There seems to be an issue with the cluster configuration service, for which manual modifications to the "cluster.xml" and/or "cluster.properties" files are only picked up by the servers when the ENTIRE cluster is restarted.

      The official user guide states the following: "If you make any manual modifications to the cluster.xml or cluster.properties (or group_name.xml or group_name.properties) files, you must stop the locator and then restart the locator using the --load-cluster-configuration-from-dir parameter. Direct file modifications are not picked up by the cluster configuration service without a locator restart.". So basically you should be able to restart the members in a rolling fashion, as long as the locators are restarted at first and they correctly pick up the new cluster configuration files from disk, the servers should have the new cluster configuration once they are restarted afterwards.

      This doesn't seem to be case according to some tests I've done.
      Basically, customer's requirement is to be able to manually modify the cluster.xml file without downtime, meaning that are okay with restarting the members one at a time, but not all of them at the same time. They can't use gfsh scripts to make these modifications, they must be able to manually modify the cluster.xml, that's their requirement.
      For some reason is always required to stop the entire cluster (locators and servers); if you don't, then the servers won't get the new cluster configuration. This can be reproduced in every run (with one, two and three locators, it doesn't matter). The reproducible scenario is attached to the JIRA, the steps are below:

      1. Download and extract the file "workspace.zip".
      2. Modify the file "00_setenv.txt", specifically the variables "JAVA_HOME" and "GEMFIRE" to use your local installation directories.
      3. Execute "01_start_cluster.sh" (start N locators and M servers, being N and M variables defined in "00_setenv.txt").
      4. Execute "02_configure_cluster.sh" (creates two regions and one index, just for testing purposes).
      5. Execute "03_change_cluster_config.sh". The main goal of this file is to replace the "cluster.xml" file with another one (located in GemFire/cluster/config/cluster.xml), and restart the members in different orders to verifiy whether the new configuration has been picked up by the servers or not. After each selection you can choose the option "6" to verify the cluster configuration. As you can see, only option 5 (shutdown the entire cluster) works correctly.
      6. Execute "04_stop_cluster.sh" and "05_clean_cluster.sh" to delete everything.
      

      This might be a documentation bug but I don't think so, if the cluster configuration is only stored in locators, why do the options 2 and 4 not work?.

      Attachments

        1. workspace.zip
          10 kB
          Jens Deppe

        Activity

          People

            jens.deppe Jens Deppe
            jens.deppe Jens Deppe
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: