Bug 34647

Summary: Tomcat cluster - "Unable to receive message through TCP Channel"
Product: Tomcat 5 Reporter: Anabel <aleonben>
Component: Catalina:ClusterAssignee: Tomcat Developers Mailing List <dev>
Status: RESOLVED DUPLICATE    
Severity: major    
Priority: P2    
Version: 5.0.28   
Target Milestone: ---   
Hardware: Other   
OS: Linux   

Description Anabel 2005-04-27 15:25:28 UTC
We have a cluster with two Tomcat servers. When we restart one of the nodes, 
without restarting the other one, it seems to be a problem in the communication 
between them. This is the log trace in the node that restarts, when it starts:

[main] INFO  org.apache.catalina.cluster.session.DeltaManager  - Starting 
clustering manager...:/TEST
[main] WARN  org.apache.catalina.cluster.session.DeltaManager  - Manager
[/TEST], requesting session state from 
org.apache.catalina.cluster.mcast.McastMember
[tcp://XXX.XXX.XXX.XXX:4001,XXX.XXX.XXX.XXXX,4001, alive=14436991]. This 
operation will timeout if no session state has been received within 60 seconds
[main] ERROR org.apache.catalina.cluster.session.DeltaManager  - Manager
[/TEST], No session state received, timing out.
org.apache.jk.common.ChannelSocket init


And the trace log in the node that remains alive:

org.apache.catalina.cluster.tcp.SimpleTcpCluster memberDisappeared
INFO: Received member disappeared:org.apache.catalina.cluster.mcast.McastMember
[tcp://YYY.YYY.YYY.YYY:4001,YYY.YYY.YYY.YYY,4001, alive=6147693]
org.apache.catalina.cluster.tcp.SimpleTcpCluster memberAdded
INFO: Replication member added:org.apache.catalina.cluster.mcast.McastMember
[tcp://YYY.YYY.YYY.YYY:4001,YYY.YYY.YYY.YYY,4001, alive=2]
[org.apache.catalina.cluster.tcp.TcpReplicationThread[3]] ERROR 
org.apache.catalina.cluster.session.DeltaManager  - Unable to receive message 
through TCP channel
java.lang.NullPointerException
        at java.io.ObjectOutputStream$BlockDataOutputStream.getUTFLength
(ObjectOutputStream.java:1898)
        at java.io.ObjectOutputStream$BlockDataOutputStream.writeUTF
(ObjectOutputStream.java:1769)
        at java.io.ObjectOutputStream.writeUTF(ObjectOutputStream.java:787)
        at 
org.apache.catalina.cluster.session.SerializablePrincipal.writePrincipal
(SerializablePrincipal.java:180)
        at org.apache.catalina.cluster.session.DeltaSession.writeObject
(DeltaSession.java:1457)
        at org.apache.catalina.cluster.session.DeltaSession.writeObjectData
(DeltaSession.java:930)
        at org.apache.catalina.cluster.session.DeltaManager.doUnload
(DeltaManager.java:539)
        at org.apache.catalina.cluster.session.DeltaManager.messageReceived
(DeltaManager.java:854)
        at org.apache.catalina.cluster.session.DeltaManager.messageDataReceived
(DeltaManager.java:762)
        at org.apache.catalina.cluster.tcp.SimpleTcpCluster.messageDataReceived
(SimpleTcpCluster.java:576)
        at org.apache.catalina.cluster.io.ObjectReader.execute
(ObjectReader.java:70)
        at org.apache.catalina.cluster.tcp.TcpReplicationThread.drainChannel
(TcpReplicationThread.java:129)
        at org.apache.catalina.cluster.tcp.TcpReplicationThread.run
(TcpReplicationThread.java:67)

I saw another bug similar to this one: 32280, but it finishes without a clear 
solution.

Thanks in advance.
Comment 1 Filip Hanik 2005-04-27 17:53:53 UTC
The strack trace indicates that you have a principal (you are logged in) but 
the login name is null. Could you give us a small test case if you can create 
one and reproduce the error?
Comment 2 Anabel 2005-04-28 08:49:56 UTC
(In reply to comment #1)
> The strack trace indicates that you have a principal (you are logged in) but 
> the login name is null. Could you give us a small test case if you can create 
> one and reproduce the error?

Hi Filip.

I have just done a test. I stopped one of the nodes in the cluster and started 
it again... as there wasn't any active session in the moment, no problem 
reported when node starts. The two nodes found each other without any problem.

Then, I have logged in with an user. It seems to be no problem with the logon, 
and I can correctly work with the application. When I stopped again one of the 
nodes, I can continue working whith the application because the other node 
takes the control. But, when I started again the node that was down, the 
situation that I have explained in my first post, it is repeated.

In that moment, the solution to communicate correctly the two nodes again is to 
stop and start both of them.

This is a big problem, because I don't have a real cluster... I only have load 
balancing and failover for the first time, because if one node fails, I can't 
do the cluster again... it is only possible if I restarts the two nodes!!!!

Best regards.
Comment 3 Anabel 2005-05-04 10:56:28 UTC
(In reply to comment #1)
> The strack trace indicates that you have a principal (you are logged in) but 
> the login name is null. Could you give us a small test case if you can create 
> one and reproduce the error?

Hi Filip...

Have you any information about the bug reported?

Thanks in advance,

Anabel.
Comment 4 Peter Rossbach 2005-10-20 08:41:35 UTC
Please, check your config with tomcat 5.5.12

I fixed the serialization from some realm implementation.

Peter


*** This bug has been marked as a duplicate of 36218 ***