[HDFS-1594] When the disk becomes full Namenode is getting shutdown and not able to recover - ASF JIRA

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 0.21.0, 0.21.1, 0.22.0
Fix Version/s: 0.23.0
Component/s: namenode
Labels:
None
Environment:

Linux linux124 2.6.27.19-5-default #1 SMP 2009-02-28 04:40:21 +0100 x86_64 x86_64 x86_64 GNU/Linux

Hadoop Flags:

Reviewed
Release Note:

Hide
Implemented a daemon thread to monitor the disk usage for periodically and if the disk usage reaches the threshold value, put the name node into Safe mode so that no modification to file system will occur. Once the disk usage reaches below the threshold, name node will be put out of the safe mode. Here threshold value and interval to check the disk usage are configurable.

Show
Implemented a daemon thread to monitor the disk usage for periodically and if the disk usage reaches the threshold value, put the name node into Safe mode so that no modification to file system will occur. Once the disk usage reaches below the threshold, name node will be put out of the safe mode. Here threshold value and interval to check the disk usage are configurable.

Description

When the disk becomes full name node is shutting down and if we try to start after making the space available It is not starting and throwing the below exception.

 

2011-01-24 23:23:33,727 ERROR org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem initialization failed.
java.io.EOFException
	at java.io.DataInputStream.readFully(DataInputStream.java:180)
	at org.apache.hadoop.io.UTF8.readFields(UTF8.java:117)
	at org.apache.hadoop.hdfs.server.namenode.FSImageSerialization.readString(FSImageSerialization.java:201)
	at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:185)
	at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:93)
	at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:60)
	at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:1089)
	at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:1041)
	at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:487)
	at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:149)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:306)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:284)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:328)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:356)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:577)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:570)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1529)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1538)
2011-01-24 23:23:33,729 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: java.io.EOFException
	at java.io.DataInputStream.readFully(DataInputStream.java:180)
	at org.apache.hadoop.io.UTF8.readFields(UTF8.java:117)
	at org.apache.hadoop.hdfs.server.namenode.FSImageSerialization.readString(FSImageSerialization.java:201)
	at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:185)
	at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:93)
	at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:60)
	at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:1089)
	at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:1041)
	at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:487)
	at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:149)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:306)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:284)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:328)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:356)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:577)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:570)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1529)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1538)

2011-01-24 23:23:33,730 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at linux124/10.18.52.124
************************************************************/

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HDFS-1594.patch
24/Jan/11 13:55
103 kB
Devaraj Kavali
HDFS-1594.patch
15/Feb/11 20:54
17 kB
Konstantin I Boudnik
HDFS-1594.patch
16/Feb/11 01:43
18 kB
Konstantin I Boudnik
hdfs-1594.6.patch
22/Apr/11 08:06
24 kB
Aaron Myers
hdfs-1594.5.patch
21/Apr/11 23:18
24 kB
Aaron Myers
hdfs-1594.4.patch
21/Apr/11 06:34
23 kB
Aaron Myers
hdfs-1594.3.patch
18/Apr/11 23:53
23 kB
Aaron Myers
hdfs-1594.2.patch
12/Apr/11 19:21
19 kB
Aaron Myers
hdfs-1594.1.patch
12/Apr/11 04:12
19 kB
Aaron Myers
hdfs-1594.0.patch
12/Apr/11 01:38
0.8 kB
Aaron Myers
hadoop-root-namenode-linux124.log
24/Jan/11 13:33
36 kB
Devaraj Kavali

Issue Links

breaks

HDFS-4862 SafeModeInfo.isManual() returns true when resources are low even if it wasn't entered into manually

Closed

depends upon

HDFS-1726 query method for what kind of safe mode the Namenode is in

Resolved

is blocked by

HDFS-2422 The NN should tolerate the same number of low-resource volumes as failed volumes

Closed

is related to

HDFS-1566 Test that covers full partition

Resolved

relates to

HDFS-1726 query method for what kind of safe mode the Namenode is in

Resolved

When the disk becomes full Namenode is getting shutdown and not able to recover

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates