A tomcat cluster, config as --------------------------------------------------------------------------- <Cluster className="org.apache.catalina.cluster.tcp.SimpleTcpCluster" managerClassName="org.apache.catalina.cluster.session.DeltaManager" expireSessionsOnShutdown="false" useDirtyFlag="true"> ------------------------------------------------------------------------- kill one node, then bring it back up again, it will not receive session data update from other node. Following is what I see after bring back the dead node ---------------------------------------------------------------------------- Created MBeanServer with ID: 18020cc:102f5dc8465:-8000:donau:1 Mar 30, 2005 3:47:13 PM org.apache.coyote.http11.Http11Protocol init INFO: Initializing Coyote HTTP/1.1 on http-8080 Mar 30, 2005 3:47:13 PM org.apache.catalina.startup.Catalina load INFO: Initialization processed in 1273 ms Mar 30, 2005 3:47:13 PM org.apache.catalina.core.StandardService start INFO: Starting service Catalina Mar 30, 2005 3:47:13 PM org.apache.catalina.core.StandardEngine start INFO: Starting Servlet Engine: Apache Tomcat/5.0 Mar 30, 2005 3:47:13 PM org.apache.catalina.core.StandardHost start INFO: XML validation disabled Mar 30, 2005 3:47:13 PM org.apache.catalina.cluster.tcp.SimpleTcpCluster start INFO: Cluster is about to start Mar 30, 2005 3:47:13 PM org.apache.catalina.cluster.mcast.McastService start INFO: Sleeping for 2000 secs to establish cluster membership Mar 30, 2005 3:47:14 PM org.apache.catalina.cluster.tcp.SimpleTcpCluster memberAdded INFO: Replication member added:org.apache.catalina.cluster.mcast.McastMember[tcp://xxx.xx.20.218:4002,xxx.xx.20.218,4002, alive=10907852] Mar 30, 2005 3:47:14 PM org.apache.catalina.cluster.tcp.SimpleTcpCluster memberAdded INFO: Replication member added:org.apache.catalina.cluster.mcast.McastMember[tcp://127.0.0.1:4002,127.0.0.1,4002, alive=92927450] Mar 30, 2005 3:47:16 PM org.apache.catalina.core.StandardHost getDeployer INFO: Create Host deployer for direct deployment ( non-jmx ) Mar 30, 2005 3:47:16 PM org.apache.catalina.core.StandardHostDeployer install INFO: Processing Context configuration file URL file:/opt/dev/share/jakarta/tomcat/base/conf/Catalina/localhost/balancer.xml Mar 30, 2005 3:47:16 PM org.apache.catalina.core.StandardHostDeployer install INFO: Processing Context configuration file URL file:/opt/dev/share/jakarta/tomcat/base/conf/Catalina/localhost/manager.xml Mar 30, 2005 3:47:17 PM org.apache.catalina.core.StandardHostDeployer install INFO: Processing Context configuration file URL file:/opt/dev/share/jakarta/tomcat/base/conf/Catalina/localhost/admin.xml Mar 30, 2005 3:47:17 PM org.apache.struts.util.PropertyMessageResources <init> INFO: Initializing, config='org.apache.struts.util.LocalStrings', returnNull=true Mar 30, 2005 3:47:17 PM org.apache.struts.util.PropertyMessageResources <init> INFO: Initializing, config='org.apache.struts.action.ActionResources', returnNull=true Mar 30, 2005 3:47:17 PM org.apache.struts.util.PropertyMessageResources <init> INFO: Initializing, config='org.apache.webapp.admin.ApplicationResources', returnNull=true Mar 30, 2005 3:47:20 PM org.apache.catalina.core.StandardHostDeployer install INFO: Installing web application at context path /tomcat-docs from URL file:/opt/dev/share/jakarta/tomcat/base/webapps/tomcat-docs Mar 30, 2005 3:47:20 PM org.apache.catalina.core.StandardHostDeployer install INFO: Installing web application at context path /jsp-examples from URL file:/opt/dev/share/jakarta/tomcat/base/webapps/jsp-examples Mar 30, 2005 3:47:20 PM org.apache.catalina.core.StandardHostDeployer install INFO: Installing web application at context path from URL file:/opt/dev/share/jakarta/tomcat/base/webapps/ROOT Mar 30, 2005 3:47:20 PM org.apache.catalina.core.StandardHostDeployer install INFO: Installing web application at context path /webdav from URL file:/opt/dev/share/jakarta/tomcat/base/webapps/webdav Mar 30, 2005 3:47:20 PM org.apache.catalina.core.StandardHostDeployer install INFO: Installing web application at context path /clusterapp from URL file:/opt/dev/share/jakarta/tomcat/base/webapps/clusterapp Creating ClusterManager for context /clusterapp using class org.apache.catalina.cluster.session.DeltaManager Mar 30, 2005 3:47:20 PM org.apache.catalina.cluster.session.DeltaManager start INFO: Starting clustering manager...:/clusterapp Mar 30, 2005 3:47:20 PM org.apache.catalina.cluster.tcp.ReplicationTransmitter sendMessageData WARNING: Unable to send replicated message, is server down? java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:305) at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:171) at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:158) at java.net.Socket.connect(Socket.java:452) at java.net.Socket.connect(Socket.java:402) at java.net.Socket.<init>(Socket.java:309) at java.net.Socket.<init>(Socket.java:153) at org.apache.catalina.cluster.tcp.SocketSender.connect(SocketSender.java:66) at org.apache.catalina.cluster.tcp.SocketSender.sendMessage(SocketSender.java:112) at org.apache.catalina.cluster.tcp.PooledSocketSender.sendMessage(PooledSocketSender.java:119) at org.apache.catalina.cluster.tcp.ReplicationTransmitter.sendMessageData(ReplicationTransmitter.java:117) at org.apache.catalina.cluster.tcp.ReplicationTransmitter.sendMessage(ReplicationTransmitter.java:136) at org.apache.catalina.cluster.tcp.SimpleTcpCluster.send(SimpleTcpCluster.java:457) at org.apache.catalina.cluster.session.DeltaManager.start(DeltaManager.java:648) at org.apache.catalina.core.ContainerBase.setManager(ContainerBase.java:499) at org.apache.catalina.startup.ContextConfig.managerConfig(ContextConfig.java:308) at org.apache.catalina.startup.ContextConfig.start(ContextConfig.java:635) at org.apache.catalina.startup.ContextConfig.lifecycleEvent(ContextConfig.java:216) at org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:119) at org.apache.catalina.core.StandardContext.start(StandardContext.java:4290) at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:823) at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:807) at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:595) at org.apache.catalina.core.StandardHostDeployer.install(StandardHostDeployer.java:277) at org.apache.catalina.core.StandardHost.install(StandardHost.java:832) at org.apache.catalina.startup.HostConfig.deployDirectories(HostConfig.java:701) at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:432) at org.apache.catalina.startup.HostConfig.start(HostConfig.java:983) at org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:349) at org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:119) at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1091) at org.apache.catalina.core.StandardHost.start(StandardHost.java:789) at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1083) at org.apache.catalina.core.StandardEngine.start(StandardEngine.java:478) at org.apache.catalina.core.StandardService.start(StandardService.java:480) at org.apache.catalina.core.StandardServer.start(StandardServer.java:2365) at org.apache.catalina.startup.Catalina.start(Catalina.java:556) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:324) at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:287) at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:425) Mar 30, 2005 3:47:20 PM org.apache.catalina.cluster.session.DeltaManager start WARNING: Manager[/clusterapp], requesting session state from org.apache.catalina.cluster.mcast.McastMember[tcp://127.0.0.1:4002,127.0.0.1,4002, alive=92933570]. This operation will timeout if no session state has been received within 60 seconds Mar 30, 2005 3:48:21 PM org.apache.catalina.cluster.session.DeltaManager start SEVERE: Manager[/clusterapp], No session state received, timing out. ClusterApp context is created. Mar 30, 2005 3:48:21 PM org.apache.catalina.core.StandardHostDeployer install INFO: Installing web application at context path /servlets-examples from URL file:/opt/dev/share/jakarta/tomcat/base/webapps/servlets-examples Mar 30, 2005 3:48:21 PM org.apache.coyote.http11.Http11Protocol start INFO: Starting Coyote HTTP/1.1 on http-8080 Mar 30, 2005 3:48:21 PM org.apache.jk.server.JkMain start INFO: APR not loaded, disabling jni components: java.io.IOException: java.lang.UnsatisfiedLinkError: no jkjni in java.library.path Mar 30, 2005 3:48:21 PM org.apache.jk.common.ChannelSocket init INFO: JK2: ajp13 listening on /0.0.0.0:8009 Mar 30, 2005 3:48:21 PM org.apache.jk.server.JkMain start INFO: Jk running ID=0 time=3/94 config=/opt/dev/share/jakarta/tomcat/base/conf/jk2.properties Mar 30, 2005 3:48:21 PM org.apache.catalina.startup.Catalina start INFO: Server startup in 68088 ms -------------------------------------------------------------------------------- It time out and create a new session context for my web application and when I switch to this server, my old session data lost. I modified the DeltaManager.java a bit, it seems solved the problem. -------------------------------------------------------------------------- [hzhao@donau session]$ diff DeltaManager.java /opt/jakarta-tomcat-5.0.28-src/jakarta-tomcat-catalina/modules/cluster/src/share/org/apache/catalina/cluster/session/DeltaManager.java 632d631 < Member mbr=null; 634,642c633 < for(int index=0; index<cluster.getMembers().length; index++) { < mbr = cluster.getMembers()[index]; < if (mbr.getHost().equals("127.0.0.1")) < mbr = null; < else < break; < } < } < if (mbr != null) { --- > Member mbr = cluster.getMembers()[0]; [hzhao@donau session]$
I reviewed the log again, obviously I am wrong, my code above is wrong too. The cause is that a node xxx.xx.20.218:4002 is a normal node, should be add in active member of cluster, but somehow it also appeared to be 127.0.0.1:4002 (The node on the localhost use port 4001, so this is not the local node) also added to the active member of cluster. and cause time out. I wonder why it may happen. Sorry for if this cause any the confusion.
I checked throughly today, and find out the problem to be one of our computer in intranet has a tomcat instance and sending out wrong information "tcp listen address 127.0.0.1:4002". I changed the multicast port, and it is fine. This is not a bug. sorry for the confusion.