[HDFS-3590] Print a WARN if the edit log sync period takes more than X time units - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Minor
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: namenode
Labels:
None

Description

If an logSync operation, which happens for calls such as FS#create() after the edit has been made at the NN metadata, takes longer than X seconds (I'd say if it took more than a minute, there's something really wrong with the volume it probably got stuck on), we should log a WARN with the volume that may have particularly caused it. This helps track down, if an NN runs with multiple NFS volumes, which particular volume may have caused it, as there's no per-NN-dir metrics of any kind.

I ran into a situation today where a hard-mounted NFS point hung for over X minutes but there was no indication in NN's logs after it recovered (recovering so late caused its own slew of issues for which I'll file other improvement JIRAs) that such an event happened, aside of the Sync (Journal Sync) metric spiking with the elapsed sync time value rising up. A log would have helped save time investigating this, and possibly would have also pin-pointed the bad location more accurately.

Attachments

Issue Links

is related to

HDFS-6110 adding more slow action log in critical write path

Closed

Activity

People

Assignee:: Unassigned

Reporter:: Harsh J

Votes:: 0 Vote for this issue

Watchers:: 10 Start watching this issue

Dates

Created:: 02/Jul/12 17:54

Updated:: 17/Mar/14 17:18