[VCL-924] Commands may hang on management node if it has an unavailable NFS share - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 2.4.2
Fix Version/s: 2.5
Component/s: vcld (backend)
Labels:
None

Description

We came across a situation on one of our management nodes related to this:
https://bugzilla.redhat.com/show_bug.cgi?id=962755

The management node had an old NFS share mounted from a storage unit which was removed from service. Attempts to unmount the share were not successful.

Under fairly rare circumstances, a vcld process will call lsof on the management node in order to determine which other vcld process is preventing it from obtaining a semaphore. This vcld process hung indefinitely due to the unavailable NFS share and the issue described in the link above.

There is currently no timeout mechanism built into the code which executes commands locally on the management node. It would be beneficial to add one and specify a timeout on commands which may hang such as lsof.

Attachments

Activity

People

Assignee:: Andrew Kurth

Reporter:: Andrew Kurth

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 14/Jan/16 21:18

Updated:: 01/Feb/17 18:28

Resolved:: 09/Feb/16 15:27