Issue Details (XML | Word | Printable)

Key: HADOOP-2873
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Major Major
Assignee: dhruba borthakur
Reporter: André Martin
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
Hadoop Common

Namenode fails to re-start after cluster shutdown - DFSClient: Could not obtain blocks even all datanodes were up & live

Created: 22/Feb/08 10:44 AM   Updated: 08/Jul/09 04:42 PM
Return to search
Component/s: None
Affects Version/s: 0.17.0
Fix Version/s: 0.17.0

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works leaseConstruction.patch 2008-02-25 09:03 PM dhruba borthakur 5 kB
Text File Licensed for inclusion in ASF works leaseConstruction.patch 2008-02-22 11:13 PM dhruba borthakur 5 kB
Text File Licensed for inclusion in ASF works leaseConstruction.patch 2008-02-22 10:04 PM dhruba borthakur 5 kB
Text File Licensed for inclusion in ASF works leaseConstruction.patch 2008-02-22 06:14 PM dhruba borthakur 2 kB

Hadoop Flags: Incompatible change
Resolution Date: 25/Feb/08 09:07 PM


 Description  « Hide
Namenode fails to re-start with the following exception:

2008-02-21 14:20:48,831 INFO org.apache.hadoop.dfs.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = se09/141.76.xxx.xxx
STARTUP_MSG: args = []
STARTUP_MSG: version = 2008-02-19_11-01-48
STARTUP_MSG: build = http://svn.apache.org/repos/asf/hadoop/core/trunk -r 628999; compiled by 'hudson' on Tue Feb 19 11:09:05 UTC 2008
************************************************************/
2008-02-21 14:20:49,367 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing RPC Metrics with serverName=NameNode, port=8000
2008-02-21 14:20:49,374 INFO org.apache.hadoop.dfs.NameNode: Namenode up at: se09.inf.tu-dresden.de/141.76.xxx.xxx:8000
2008-02-21 14:20:49,378 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=NameNode, sessionId=null
2008-02-21 14:20:49,381 INFO org.apache.hadoop.dfs.NameNodeMetrics: Initializing NameNodeMeterics using context object:org.apache.hadoop.metrics.spi.NullContext
2008-02-21 14:20:49,501 INFO org.apache.hadoop.fs.FSNamesystem: fsOwner=amartin,students
2008-02-21 14:20:49,501 INFO org.apache.hadoop.fs.FSNamesystem: supergroup=supergroup
2008-02-21 14:20:49,501 INFO org.apache.hadoop.fs.FSNamesystem: isPermissionEnabled=true
2008-02-21 14:20:49,788 INFO org.apache.hadoop.ipc.Server: Stopping server on 8000
2008-02-21 14:20:49,790 ERROR org.apache.hadoop.dfs.NameNode: java.io.IOException: Created 13 leases but found 4
at org.apache.hadoop.dfs.FSImage.loadFilesUnderConstruction(FSImage.java:935)
at org.apache.hadoop.dfs.FSImage.loadFSImage(FSImage.java:749)
at org.apache.hadoop.dfs.FSImage.loadFSImage(FSImage.java:634)
at org.apache.hadoop.dfs.FSImage.recoverTransitionRead(FSImage.java:223)
at org.apache.hadoop.dfs.FSDirectory.loadFSImage(FSDirectory.java:79)
at org.apache.hadoop.dfs.FSNamesystem.initialize(FSNamesystem.java:261)
at org.apache.hadoop.dfs.FSNamesystem.<init>(FSNamesystem.java:242)
at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:131)
at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:176)
at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:162)
at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:851)
at org.apache.hadoop.dfs.NameNode.main(NameNode.java:860)

2008-02-21 14:20:49,791 INFO org.apache.hadoop.dfs.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at se09/141.76.xxx.xxx
************************************************************/

Cluster restart was needed since the DFS client produced the following error message even all datanodes were up:

08/02/21 14:04:35 INFO fs.DFSClient: Could not obtain block blk_-4008950704646490788 from any node: java.io.IOException: No live nodes contain current block



 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Raghu Angadi added a comment - 22/Feb/08 05:02 PM
This is mostly related HADOOP-2345.

Raghu Angadi added a comment - 22/Feb/08 05:14 PM
FSNameSystem.saveFilesUnderConstruction() and FSImage.loadFilesUnderConstruction() don't seem to match.

FSImage.loadFilesUnderConstruction() assumes there is only one file per lease.


Raghu Angadi added a comment - 22/Feb/08 05:18 PM
Andre,

as a temporary hack, you can just comment out the FSImage.java:749 and your restart should work, since these are last entries read from the image file.


dhruba borthakur added a comment - 22/Feb/08 06:14 PM
Store the number of files under construction rather than the number of leases.

dhruba borthakur added a comment - 22/Feb/08 06:15 PM
Must fix for 0.17. This was a regression introduced by HADOOP-2345.

Raghu Angadi added a comment - 22/Feb/08 06:24 PM
Do we need to increase the layout version? And to protect the users that upgraded to current trunk, we chould change FSImage.java:914 to {{ if (version >= -12) ... }}

Konstantin Shvachko added a comment - 22/Feb/08 07:33 PM
The layout version was increased by HADOOP-2345.
I think it should be promoted to 0.16.1. People will complain all the time.

Konstantin Shvachko added a comment - 22/Feb/08 08:26 PM
Sorry, it is was committed in 0.17, so the bug does not exist in 0.16.
I guess I wanted to say it should be fixed asap.

dhruba borthakur added a comment - 22/Feb/08 10:04 PM
Unit test. Bump layout version.

Raghu Angadi added a comment - 22/Feb/08 10:14 PM
There is a tmp log left in the patch. other than that,

+1. It fixes the above problem.


dhruba borthakur added a comment - 22/Feb/08 11:13 PM
Removed debug log message.

dhruba borthakur added a comment - 25/Feb/08 07:27 PM
I am going to commit this patch because it fixes a very serious problem with the Namenode transaction log. I have waited for 3 days for the HadoopQA tests to run, but is has not run yet. I think it is better to check in this fix sooner rather than wait for Hadoop QA to run.

dhruba borthakur added a comment - 25/Feb/08 09:03 PM
merged with latest trunk

dhruba borthakur added a comment - 25/Feb/08 09:07 PM
I just committed this.

Hudson added a comment - 26/Feb/08 12:04 PM

Robert Chansler added a comment - 14/Apr/08 04:30 PM
Noted as incompatible in changes.txt