Uploaded image for project: 'Directory ApacheDS'
  1. Directory ApacheDS
  2. DIRSERVER-1894

Multi-Master replicated startup does not complete

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • 2.0.0-M15
    • 2.0.0-M16
    • ldap
    • None

    Description

      On startup of a directory instance configured as a replication consumer, the instance is unable to bind to its local port until a connection can be made to the replication provider. In a 2 node multi-master setup this has a chicken and egg effect in that neither node is able to starts its LDAP port and the following errors are repeated in the logs indefinitely.

      Instance 1:

      [12:58:26] ERROR [org.apache.directory.server.CONSUMER_LOG] - Failed to connect to the server localhost:11389, cause : Cannot connect on the server: Connection refused
      [12:58:26] ERROR [org.apache.directory.server.ldap.replication.consumer.ReplicationConsumerImpl] - Failed to connect to the server localhost:11389, cause : Cannot connect on the server: Connection refused

      Instance 2:

      [12:58:14] ERROR [org.apache.directory.server.CONSUMER_LOG] - Failed to connect to the server localhost:10389, cause : Cannot connect on the server: Connection refused
      [12:58:14] ERROR [org.apache.directory.server.ldap.replication.consumer.ReplicationConsumerImpl] - Failed to connect to the server localhost:10389, cause : Cannot connect on the server: Connection refused

      netstat shows that the LDAP ports are not bound.

      > netstat -a | egrep "10389|11389"

      It is possible to trick the instances into starting up by starting instance 1 without being a replication consumer, then starting instance 2. I then stop instance 1 change it to be a consumer and restart it. Then both instances are running and netstat shows me the replication connections and the listening LDAP ports. Replication now works in both directions.

      > netstat -a | egrep "10389|11389"
      tcp4 0 0 localhost.10389 localhost.51051 ESTABLISHED
      tcp4 0 0 localhost.51051 localhost.10389 ESTABLISHED
      tcp46 0 0 .10389 *. LISTEN
      tcp4 0 0 localhost.11389 localhost.51050 ESTABLISHED
      tcp4 0 0 localhost.51050 localhost.11389 ESTABLISHED
      tcp46 0 0 .11389 *. LISTEN

      I will attach the configuration file of the two instances that can be used to reproduce this problem.

      Attachments

        1. config-2.ldif
          41 kB
          Paul Bayliss
        2. config-1.ldif
          41 kB
          Paul Bayliss

        Activity

          People

            akiran Kiran Ayyagari
            prbayliss Paul Bayliss
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: