Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-210

Namenode not able to accept connections

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.4.0
    • None
    • None
    • linux

    Description

      I am running owen's random writer on a 627 node cluster (writing 10GB/node). After running for a while (map 12% reduce 1%) I get the following error on the Namenode:

      Exception in thread "Server listener on port 60000" java.lang.OutOfMemoryError: unable to create new native thread
      at java.lang.Thread.start0(Native Method)
      at java.lang.Thread.start(Thread.java:574)
      at org.apache.hadoop.ipc.Server$Listener.run(Server.java:105)

      After this, the namenode does not seem to be accepting connections from any of the clients. All the DFSClient calls get timeout. Here is a trace for one of them:
      java.net.SocketTimeoutException: timed out waiting for rpc response
      at org.apache.hadoop.ipc.Client.call(Client.java:305)
      at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:149)
      at org.apache.hadoop.dfs.$Proxy1.open(Unknown Source)
      at org.apache.hadoop.dfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:419)
      at org.apache.hadoop.dfs.DFSClient$DFSInputStream.(DFSClient.java:406)
      at org.apache.hadoop.dfs.DFSClient.open(DFSClient.java:171)
      at org.apache.hadoop.dfs.DistributedFileSystem.openRaw(DistributedFileSystem.java:78)
      at org.apache.hadoop.fs.FSDataInputStream$Checker.(FSDataInputStream.java:46)
      at org.apache.hadoop.fs.FSDataInputStream.(FSDataInputStream.java:228)
      at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:157)
      at org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:43)
      at org.apache.hadoop.mapred.MapTask.run(MapTask.java:105)
      at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:785).

      The namenode then has around 1% CPU utilization at this time (after the outofmemory exception has been thrown). I have profiled the NameNode and it seems to be using around a maixmum heap size of 57MB (which is not much). So, heap size does not seem to be a problem. It might be happening due to lack of Stack space? Any pointers?

      Attachments

        1. nio.patch
          19 kB
          Devaraj Das
        2. nio.patch
          19 kB
          Devaraj Das
        3. nio.new.patch
          21 kB
          Devaraj Das

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            ddas Devaraj Das
            mahadev Mahadev Konar
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment