Uploaded image for project: 'Directory ApacheDS'
  1. Directory ApacheDS
  2. DIRSERVER-1894

Multi-Master replicated startup does not complete

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 2.0.0-M15
    • Fix Version/s: 2.0.0-M16
    • Component/s: ldap
    • Labels:
      None

      Description

      On startup of a directory instance configured as a replication consumer, the instance is unable to bind to its local port until a connection can be made to the replication provider. In a 2 node multi-master setup this has a chicken and egg effect in that neither node is able to starts its LDAP port and the following errors are repeated in the logs indefinitely.

      Instance 1:

      [12:58:26] ERROR [org.apache.directory.server.CONSUMER_LOG] - Failed to connect to the server localhost:11389, cause : Cannot connect on the server: Connection refused
      [12:58:26] ERROR [org.apache.directory.server.ldap.replication.consumer.ReplicationConsumerImpl] - Failed to connect to the server localhost:11389, cause : Cannot connect on the server: Connection refused

      Instance 2:

      [12:58:14] ERROR [org.apache.directory.server.CONSUMER_LOG] - Failed to connect to the server localhost:10389, cause : Cannot connect on the server: Connection refused
      [12:58:14] ERROR [org.apache.directory.server.ldap.replication.consumer.ReplicationConsumerImpl] - Failed to connect to the server localhost:10389, cause : Cannot connect on the server: Connection refused

      netstat shows that the LDAP ports are not bound.

      > netstat -a | egrep "10389|11389"

      It is possible to trick the instances into starting up by starting instance 1 without being a replication consumer, then starting instance 2. I then stop instance 1 change it to be a consumer and restart it. Then both instances are running and netstat shows me the replication connections and the listening LDAP ports. Replication now works in both directions.

      > netstat -a | egrep "10389|11389"
      tcp4 0 0 localhost.10389 localhost.51051 ESTABLISHED
      tcp4 0 0 localhost.51051 localhost.10389 ESTABLISHED
      tcp46 0 0 .10389 *. LISTEN
      tcp4 0 0 localhost.11389 localhost.51050 ESTABLISHED
      tcp4 0 0 localhost.51050 localhost.11389 ESTABLISHED
      tcp46 0 0 .11389 *. LISTEN

      I will attach the configuration file of the two instances that can be used to reproduce this problem.

        Attachments

        1. config-1.ldif
          41 kB
          Paul Bayliss
        2. config-2.ldif
          41 kB
          Paul Bayliss

          Activity

            People

            • Assignee:
              akiran Kiran Ayyagari
              Reporter:
              prbayliss Paul Bayliss
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: