Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
ManifoldCF 1.5
-
None
-
4 Agents
3 member ZK ensemble (2 live, 1 dead)
Description
If a member of the ZK ensemble is down but there is still a majority of members active so that ZK is 'live' then when the agents startup any agents that try to connect to the missing member abort with:
Opening socket connection to server overlorddev03/10.250.0.36:2181. Will not att
empt to authenticate using SASL (unknown error)
71 [main-SendThread(overlorddev03:2181)] WARN org.apache.zookeeper.ClientCnxn -
Session 0x0 for server null, unexpected error, closing socket connection and att
empting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:735
)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocket
NIO.java:350)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
followed by:
org.apache.manifoldcf.core.interfaces.ManifoldCFException: Initialization failed: KeeperErrorCode = ConnectionLoss for /org.apache.manifoldcf.configuration
at org.apache.manifoldcf.core.system.ManifoldCF.initializeEnvironment(ManifoldCF.java:269)
at org.apache.manifoldcf.agents.system.ManifoldCF.initializeEnvironment(ManifoldCF.java:43)
at org.apache.manifoldcf.agents.BaseAgentsInitializationCommand.execute(BaseAgentsInitializationCommand.java:36)
at org.apache.manifoldcf.agents.AgentRun.main(AgentRun.java:93)
This has a knock affect to the other agents which then eventually fail with 'agents process could not start - shutting down'.
Besides exceptions of this type:
5401 [main-SendThread(overlorddev03:2181)] INFO org.apache.zookeeper.ClientCnxn
- Opening socket connection to server overlorddev03/10.250.0.36:2181. Will not a
ttempt to authenticate using SASL (unknown error)
5403 [main-SendThread(overlorddev03:2181)] WARN org.apache.zookeeper.ClientCnxn - Session 0x0 for server null, unexpected error, closing socket connection and a
ttempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:735
)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocket
NIO.java:350)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
5506 [main-SendThread(overlorddev04:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server overlorddev04/10.250.0.46:2181. Will not attempt to authenticate using SASL (unknown error)
5507 [main-SendThread(overlorddev04:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to overlorddev04/10.250.0.46:2181, initiating session
the only other notable exception is:
5509 [main-SendThread(overlorddev04:2181)] INFO org.apache.zookeeper.ClientCnxn
- Session establishment complete on server overlorddev04/10.250.0.46:2181, sessi
onid = 0x4444f2cb0590087, negotiated timeout = 8000
org.apache.manifoldcf.core.interfaces.ManifoldCFException: KeeperErrorCode = Con
nectionLoss for /org.apache.manifoldcf.flags-AGENTRUN
at org.apache.manifoldcf.core.lockmanager.ZooKeeperConnection.checkGlobalFlag(ZooKeeperConnection.java:499)
at org.apache.manifoldcf.core.lockmanager.ZooKeeperLockManager.checkGlobalFlag(ZooKeeperLockManager.java:787)
at org.apache.manifoldcf.agents.system.AgentsDaemon.runAgents(AgentsDaemon.java:110)
at org.apache.manifoldcf.agents.AgentRun.doExecute(AgentRun.java:64)
at org.apache.manifoldcf.agents.BaseAgentsInitializationCommand.execute(BaseAgentsInitializationCommand.java:37)
at org.apache.manifoldcf.agents.AgentRun.main(AgentRun.java:93)