[HADOOP-210] Namenode not able to accept connections - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 0.4.0
Component/s: None
Labels:
None
Environment:

linux

Description

I am running owen's random writer on a 627 node cluster (writing 10GB/node). After running for a while (map 12% reduce 1%) I get the following error on the Namenode:

Exception in thread "Server listener on port 60000" java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:574)
at org.apache.hadoop.ipc.Server$Listener.run(Server.java:105)

After this, the namenode does not seem to be accepting connections from any of the clients. All the DFSClient calls get timeout. Here is a trace for one of them:
java.net.SocketTimeoutException: timed out waiting for rpc response
at org.apache.hadoop.ipc.Client.call(Client.java:305)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:149)
at org.apache.hadoop.dfs.$Proxy1.open(Unknown Source)
at org.apache.hadoop.dfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:419)
at org.apache.hadoop.dfs.DFSClient$DFSInputStream.(DFSClient.java:406)
at org.apache.hadoop.dfs.DFSClient.open(DFSClient.java:171)
at org.apache.hadoop.dfs.DistributedFileSystem.openRaw(DistributedFileSystem.java:78)
at org.apache.hadoop.fs.FSDataInputStream$Checker.(FSDataInputStream.java:46)
at org.apache.hadoop.fs.FSDataInputStream.(FSDataInputStream.java:228)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:157)
at org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:43)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:105)
at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:785).

The namenode then has around 1% CPU utilization at this time (after the outofmemory exception has been thrown). I have profiled the NameNode and it seems to be using around a maixmum heap size of 57MB (which is not much). So, heap size does not seem to be a problem. It might be happening due to lack of Stack space? Any pointers?

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

nio.new.patch
21/Jun/06 11:57
21 kB
Devaraj Das
nio.patch
13/Jun/06 15:28
19 kB
Devaraj Das
nio.patch
13/Jun/06 02:34
19 kB
Devaraj Das

Activity

People

Assignee:: Devaraj Das

Reporter:: Mahadev Konar

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 11/May/06 07:09

Updated:: 08/Jul/09 16:41

Resolved:: 22/Jun/06 01:15