[HDFS-7609] Avoid retry cache collision when Standby NameNode loading edits - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Critical
Resolution: Fixed
Affects Version/s: 2.2.0
Fix Version/s: 2.6.1, 2.8.0, 2.7.2, 3.0.0-alpha1
Component/s: namenode
Labels:
- 2.6.1-candidate
- 2.7.2-candidate

Hadoop Flags:

Reviewed

Description

One day my namenode crashed because of two journal node timed out at the same time under very high load, leaving behind about 100 million transactions in edits log.(I still have no idea why they were not rolled into fsimage.)
I tryed to restart namenode, but it showed that almost 20 hours would be needed before finish, and it was loading fsedits most of the time. I also tryed to restart namenode in recover mode, the loading speed had no different.
I looked into the stack trace, judged that it is caused by the retry cache. So I set dfs.namenode.enable.retrycache to false, the restart process finished in half an hour.

I think the retry cached is useless during startup, at least during recover process.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HDFS-7609.patch
04/May/15 18:13
21 kB
Ming Ma
HDFS-7609-2.patch
26/May/15 23:52
27 kB
Ming Ma
HDFS-7609-3.patch
29/May/15 03:22
22 kB
Ming Ma
HDFS-7609-branch-2.7.2.txt
10/Sep/15 19:52
23 kB
Vinod Kumar Vavilapalli
HDFS-7609-CreateEditsLogWithRPCIDs.patch
22/Jan/15 23:56
2 kB
Chris Nauroth
recovery_do_not_use_retrycache.patch
15/Jan/15 02:03
2 kB
Carrey Zhan

Issue Links

duplicates

HDFS-10246 Standby NameNode dfshealth.jsp Response very slow

Resolved

Activity

People

Assignee:: Ming Ma

Reporter:: Carrey Zhan

Votes:: 0 Vote for this issue

Watchers:: 18 Start watching this issue

Dates

Created:: 14/Jan/15 08:38

Updated:: 06/Jan/17 00:46

Resolved:: 29/May/15 18:14