[TS-1647] VMWARE cannot read from cache node marked as down - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: None
Component/s: Cache, Clustering
Labels:
None

Backport to Version:

3.2.0

Description

We are seeing issues related to cluster management on VMWARE. Seems like we have a NIC going to sleep and losing packets during a read operation from a member of the cluster, then seeing this as marked down by ATS, and stays down until the entire cluster is restarted.. seems like idle times are at the heart of the issue and allocating 500MHZ to each VM or pinning the CPU doesn't help the NIC still sleeps .. TCPDUMP see's missing segments yet reports 0 packets lost. dmesg see's PC: bad TCP reclen 0x73746174 (non-terminal)
RPC: bad TCP reclen 0x63480000 (large)
RPC: bad TCP reclen 0x633f0000 (large) under load, and ATS reports cannot read from a cluster node and marks the node down.

WE have another datacenter where this does not happen

the difference in kernel revisions are:

Sleeping NIC Data Center RHEL 5
2.6.18-308.16.1.el5

Working Data Center
2.6.18-194.3.1.el5

WE have validated that VMWARE is running properly in this datacenter but are trying to get a ticket open with them to look into why one configuration works but another does not

Everything else in the two configurations are nearly identical we are going to try and get the nic drivers updated as you can see it is the LATER Linux Kernel version that is causing headaches..

Any ideas would really be appreciated ...

Thank you,

Dow

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

909smokinggun.png
12/Jan/13 04:43
153 kB
Dow Buzzell
stackbug.png
12/Jan/13 05:10
160 kB
Dow Buzzell

Activity

People

Assignee:: Unassigned

Reporter:: Dow Buzzell

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 11/Jan/13 04:48

Updated:: 17/Jan/13 04:27

Resolved:: 16/Jan/13 04:47