Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
Description
When trying to assign a non-persistent parallel gateway-sender / async-event-queue to a persistent partitioned region through gfsh, the actual region is left inconsistent in the cluster configuration service if the internal function is executed more than once.
The problem is that the gateway-sender / async-event-queue is added to the internal list too early within the execution lifecycle and, if the actual addition fails afterwards, the internal list is never reverted to its original state. This invalid configuration is persisted into the cluster configuration service afterwards (for the second, "successful execution"), so the subsequent restart of the servers will miserably fail.
The following set of steps reproduces the problem for a gateway-sender, but the logic is exactly the same for an async-event-queue:
gfsh -e "start locator --name=locator --port=10101" gfsh -e "start server --name=server --server-port=40404 --locators=localhost[10101]" gfsh -e "connect --locator=localhost[10101]" -e "create disk-store --name=diskStore --dir=diskStore" gfsh -e "connect --locator=localhost[10101]" -e "create region --name=testRegion --type=PARTITION_PERSISTENT --disk-store=diskStore" gfsh -e "connect --locator=localhost[10101]" -e "create gateway-sender --id=gateway --parallel=true --remote-distributed-system-id=2 --enable-persistence=false" # First Execution Fails gfsh -e "connect --locator=localhost[10101]" -e "alter region --name=testRegion --gateway-sender-id=gateway" Member | Status | Message ------ | ------ | ------------------------------------------------------------------------------------------------------------------------------------------------------- server | ERROR | org.apache.geode.internal.cache.wan.GatewaySenderException: Non persistent gateway sender gateway can not be attached to persistent region /testRegion # Second Execution Succeeds gfsh -e "connect --locator=localhost[10101]" -e "alter region --name=testRegion --gateway-sender-id=gateway" Member | Status | Message ------ | ------ | ------------------------- server | OK | Region testRegion altered gfsh -e "connect --locator=localhost[10101]" -e "stop server --name=server" gfsh -e "start server --name=server --server-port=40404 --locators=localhost[10101]" ....The Cache Server process terminated unexpectedly with exit status 1. Please refer to the log file in /server for full details. Exception in thread "main" org.apache.geode.internal.cache.wan.GatewaySenderException: Non persistent gateway sender gateway can not be attached to persistent region /testRegion at org.apache.geode.internal.cache.wan.parallel.ParallelGatewaySenderQueue.addShadowPartitionedRegionForUserPR(ParallelGatewaySenderQueue.java:454) # The log shows that the cluster configuration receiged is invalid: [info 2019/03/21 11:52:57.606 GMT <main> tid=0x1] Received cluster configuration from the locator [info 2019/03/21 11:52:57.638 GMT <main> tid=0x1] *************************************************************** Configuration for 'cluster' Jar files to deployed <?xml version="1.0" encoding="UTF-8" standalone="no"?> <cache xmlns="http://geode.apache.org/schema/cache" xmlns:jdbc="http://geode.apache.org/schema/jdbc" xmlns:lucene="http://geode.apache.org/schema/lucene" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" version="1.0" xsi:schemaLocation="http://geode.apache.org/schema/lucene http://geode.apache.org/schema/lucene/lucene-1.0.xsd http://geode.apache.org/schema/jdbc http://geode.apache.org/schema/jdbc/jdbc-1.0.xsd http://geode.apache.org/schema/cache http://geode.apache.org/schema/cache/cache-1.0.xsd"> <gateway-sender disk-synchronous="true" enable-batch-conflation="false" enable-persistence="false" id="gateway" manual-start="false" parallel="true" remote-distributed-system-id="2"/> <disk-store allow-force-compaction="false" auto-compact="true" compaction-threshold="50" disk-usage-critical-percentage="99" disk-usage-warning-percentage="90" max-oplog-size="1024" name="diskStore" queue-size="0" time-interval="1000" write-buffer-size="32768"> <disk-dirs> <disk-dir dir-size="2147483647">diskStore</disk-dir> </disk-dirs> </disk-store> <region name="testRegion" refid="PARTITION_PERSISTENT"> <region-attributes data-policy="persistent-partition" disk-store-name="diskStore" gateway-sender-ids="gateway"/> </region> </cache>
Improve the current validations invoked from within the RegionAlterFunction and added through GEODE-4919 to also include the persistent checks (currently done in ParallelGatewaySenderQueue.addShadowPartitionedRegionForUserPR).
Attachments
Issue Links
- links to