Description
On startup of a directory instance configured as a replication consumer, the instance is unable to bind to its local port until a connection can be made to the replication provider. In a 2 node multi-master setup this has a chicken and egg effect in that neither node is able to starts its LDAP port and the following errors are repeated in the logs indefinitely.
Instance 1:
[12:58:26] ERROR [org.apache.directory.server.CONSUMER_LOG] - Failed to connect to the server localhost:11389, cause : Cannot connect on the server: Connection refused
[12:58:26] ERROR [org.apache.directory.server.ldap.replication.consumer.ReplicationConsumerImpl] - Failed to connect to the server localhost:11389, cause : Cannot connect on the server: Connection refused
Instance 2:
[12:58:14] ERROR [org.apache.directory.server.CONSUMER_LOG] - Failed to connect to the server localhost:10389, cause : Cannot connect on the server: Connection refused
[12:58:14] ERROR [org.apache.directory.server.ldap.replication.consumer.ReplicationConsumerImpl] - Failed to connect to the server localhost:10389, cause : Cannot connect on the server: Connection refused
netstat shows that the LDAP ports are not bound.
> netstat -a | egrep "10389|11389"
It is possible to trick the instances into starting up by starting instance 1 without being a replication consumer, then starting instance 2. I then stop instance 1 change it to be a consumer and restart it. Then both instances are running and netstat shows me the replication connections and the listening LDAP ports. Replication now works in both directions.
> netstat -a | egrep "10389|11389"
tcp4 0 0 localhost.10389 localhost.51051 ESTABLISHED
tcp4 0 0 localhost.51051 localhost.10389 ESTABLISHED
tcp46 0 0 .10389 *. LISTEN
tcp4 0 0 localhost.11389 localhost.51050 ESTABLISHED
tcp4 0 0 localhost.51050 localhost.11389 ESTABLISHED
tcp46 0 0 .11389 *. LISTEN
I will attach the configuration file of the two instances that can be used to reproduce this problem.