Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-5606

JobTracker blocked for DFSClient: Failed recovery attempt

    Details

    • Type: Bug Bug
    • Status: Reopened
    • Priority: Critical Critical
    • Resolution: Unresolved
    • Affects Version/s: 1.0.3
    • Fix Version/s: None
    • Component/s: jobtracker
    • Labels:
      None
    • Environment:

      centos 5.8 jdk 1.7

    • Release Note:
      Hide
      jobTracker 's logs :

      2013-11-04 12:29:24,651 WARN org.apache.hadoop.hdfs.DFSClient: Failed recovery attempt #0 from primary datanode xx.xx.23.48:1004
              at org.apache.hadoop.ipc.Client.wrapException(Client.java:1103)
              at org.apache.hadoop.ipc.Client.call(Client.java:1075)
              at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
              at $Proxy8.recoverBlock(Unknown Source)
              at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:3121)
              at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2100(DFSClient.java:2589)
              at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2793)
              at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
              at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
              at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
              at java.io.DataInputStream.read(DataInputStream.java:132)
              at org.apache.hadoop.security.SaslInputStream.read(SaslInputStream.java:239)
              at org.apache.hadoop.ipc.Client$Connection.run(Client.java:749)
      2013-11-04 12:29:25,661 WARN org.apache.hadoop.hdfs.DFSClient: Failed recovery attempt #1 from primary datanode xx.xx.23.48:1004
              at org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:1884)
              at org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:2047)
              at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
              at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
              at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
              at java.lang.reflect.Method.invoke(Method.java:601)
              at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563)
              at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)
              at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384)
              at java.security.AccessController.doPrivileged(Native Method)
              at javax.security.auth.Subject.doAs(Subject.java:415)
              at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
              at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382)
              at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2793)
      2013-11-04 12:29:26,669 WARN org.apache.hadoop.hdfs.DFSClient: Failed recovery attempt #2 from primary datanode xx.xx.23.48:1004
              at org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:1884)
              at org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:2047)
              at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
              at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
              at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
              at java.lang.reflect.Method.invoke(Method.java:601)
              at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563)
              at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)
              at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384)
              at java.security.AccessController.doPrivileged(Native Method)
              at javax.security.auth.Subject.doAs(Subject.java:415)
              at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
              at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382)
              at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2793)
      2013-11-04 12:29:27,676 WARN org.apache.hadoop.hdfs.DFSClient: Failed recovery attempt #3 from primary datanode xx.xx.23.48:1004
              at org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:1884)
              at org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:2047)
              at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
              at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
              at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
              at java.lang.reflect.Method.invoke(Method.java:601)
              at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563)
              at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)
              at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384)
              at java.security.AccessController.doPrivileged(Native Method)
              at javax.security.auth.Subject.doAs(Subject.java:415)
              at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
              at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382)
              at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2793)
      2013-11-04 12:29:28,678 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_4347580689772316942_201043980 bad datanode[0] xx.xx.24.115:1004
      2013-11-04 12:29:28,678 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_4347580689772316942_201043980 in pipeline xx.xx.24.11
      5:1004, xx.xx.23.48:1004, xx.xx.23.82:1004: bad datanode xx.xx.24.115:1004
      2013-11-04 12:29:28,683 WARN org.apache.hadoop.hdfs.DFSClient: Failed recovery attempt #4 from primary datanode xx.xx
      .23.48:1004
      org.apache.hadoop.ipc.RemoteException: java.io.IOException: Block blk_4347580689772316942_201043980 is already being recovered, ignoring this req
      uest to recover it.
              at org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:1884)
              at org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:2047)
              at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
              at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
              at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
              at java.lang.reflect.Method.invoke(Method.java:601)
              at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563)
              at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)
              at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384)
              at java.security.AccessController.doPrivileged(Native Method)
              at javax.security.auth.Subject.doAs(Subject.java:415)
              at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
              at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382)

              at org.apache.hadoop.ipc.Client.call(Client.java:1070)
              at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
              at $Proxy8.recoverBlock(Unknown Source)
              at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:3121)
              at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2100(DFSClient.java:2589)
              at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2793)
      2013-11-04 12:29:28,683 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_4347580689772316942_201043980 failed because recovery
       from primary datanode xx.xx.23.48:1004 failed 5 times. Pipeline was xx.xx.24.115:1004, xx.xx.23.48:1004, xx.xx.23.82:1004. Will retry...
      2013-11-04 12:29:29,685 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_4347580689772316942_201043980 bad datanode[0] xx.xx.24.115:1004
      2013-11-04 12:29:29,685 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_4347580689772316942_201043980 in pipeline xx.xx.24.115:1004, xx.xx.23.48:1004, xx.xx.23.82:1004: bad datanode xx.xx.24.115:1004
      2013-11-04 12:29:29,689 WARN org.apache.hadoop.hdfs.DFSClient: Failed recovery attempt #5 from primary datanode xx.xx.23.48:1004
      org.apache.hadoop.ipc.RemoteException: java.io.IOException: Block blk_4347580689772316942_201043980 is already being recovered, ignoring this request to recover it.
                                                                                                                                     
      2013-11-04 12:29:28,678 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_4347580689772316942_201043980 bad datanode[0] xx.xx.24.115:1004
      2013-11-04 12:29:28,678 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_4347580689772316942_201043980 in pipeline xx.xx.24.115:1004, xx.xx.23.48:1004, xx.xx.23.82:1004: bad datanode xx.xx.24.115:1004
      2013-11-04 12:29:28,683 WARN org.apache.hadoop.hdfs.DFSClient: Failed recovery attempt #4 from primary datanode xx.xx.23.48:1004
      org.apache.hadoop.ipc.RemoteException: java.io.IOException: Block blk_4347580689772316942_201043980 is already being recovered, ignoring this request to recover it.
              at org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:1884)
              at org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:2047)
              at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
              at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
              at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
              at java.lang.reflect.Method.invoke(Method.java:601)
              at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563)
              at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)
              at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384)
              at java.security.AccessController.doPrivileged(Native Method)
              at javax.security.auth.Subject.doAs(Subject.java:415)
              at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
              at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382)

              at org.apache.hadoop.ipc.Client.call(Client.java:1070)
              at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
              at $Proxy8.recoverBlock(Unknown Source)
              at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:3121)
              at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2100(DFSClient.java:2589)
              at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2793)
      2013-11-04 12:29:28,683 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_4347580689772316942_201043980 failed because recovery f
      rom primary datanode xx.xx.23.48:1004 failed 5 times. Pipeline was xx.xx.24.115:1004, xx.xx.23.48:1004, xx.xx.23.82:1004. Will retry...
      2013-11-04 12:29:29,685 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_4347580689772316942_201043980 bad datanode[0] xx.xx.24.115:1004
      2013-11-04 12:29:29,685 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_4347580689772316942_201043980 in pipeline xx.xx.24.115:1004, xx.xx.23.48:1004, xx.xx.23.82:1004: bad datanode xx.xx.24.115:1004
      2013-11-04 12:29:29,689 WARN org.apache.hadoop.hdfs.DFSClient: Failed recovery attempt #5 from primary datanode xx.xx.23.48:1004
      org.apache.hadoop.ipc.RemoteException: java.io.IOException: Block blk_4347580689772316942_201043980 is already being recovered, ignoring this request to recover it.
              at org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:1884)
                                                                                                                                      
      2013-11-04 12:29:28,678 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_4347580689772316942_201043980 bad datanode[0] xx.xx.24.115:1004
      2013-11-04 12:29:28,678 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_4347580689772316942_201043980 in pipeline xx.xx.24.115:1004, xx.xx.23.48:1004, xx.xx.23.82:1004: bad datanode xx.xx.24.115:1004
      2013-11-04 12:29:28,683 WARN org.apache.hadoop.hdfs.DFSClient: Failed recovery attempt #4 from primary datanode xx.xx.23.48:1004
      org.apache.hadoop.ipc.RemoteException: java.io.IOException: Block blk_4347580689772316942_201043980 is already being recovered, ignoring this request to recover it.
              at org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:1884)
              at org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:2047)
              at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
              at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
              at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
              at java.lang.reflect.Method.invoke(Method.java:601)
              at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563)
              at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)
              at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384)
              at java.security.AccessController.doPrivileged(Native Method)
              at javax.security.auth.Subject.doAs(Subject.java:415)
              at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
              at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382)

              at org.apache.hadoop.ipc.Client.call(Client.java:1070)
              at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
              at $Proxy8.recoverBlock(Unknown Source)
              at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:3121)
              at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2100(DFSClient.java:2589)
              at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2793)
      2013-11-04 12:29:28,683 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_4347580689772316942_201043980 failed because recovery from primary datanode xx.xx.23.48:1004 failed 5 times.
      Pipeline was xx.xx.24.115:1004, xx.xx.23.48:1004, xx.xx.23.82:1004. Will retry...
      2013-11-04 12:29:29,685 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_4347580689772316942_201043980 bad datanode[0] xx.xx.24.115:1004
      2013-11-04 12:29:29,685 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_4347580689772316942_201043980 in pipeline xx.xx.24.115:1004, xx.xx.23.48:1004, xx.xx.23.82:1004: bad datanode xx.xx.24.115:1004
      2013-11-04 12:29:29,689 WARN org.apache.hadoop.hdfs.DFSClient: Failed recovery attempt #5 from primary datanode xx.xx.23.48:1004
      org.apache.hadoop.ipc.RemoteException: java.io.IOException: Block blk_4347580689772316942_201043980 is already being recovered, ignoring this request to recover it.
      2013-11-04 12:29:24,651 WARN org.apache.hadoop.hdfs.DFSClient: Failed recovery attempt #0 from primary datanode xx.xx.23.48:1004
              at org.apache.hadoop.ipc.Client.wrapException(Client.java:1103)
              at org.apache.hadoop.ipc.Client.call(Client.java:1075)
              at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
              at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2100(DFSClient.java:2589)
              at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
              at java.io.DataInputStream.read(DataInputStream.java:132)
              at org.apache.hadoop.security.SaslInputStream.read(SaslInputStream.java:239)
              at java.io.FilterInputStream.read(FilterInputStream.java:116)
              at org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:342)
              at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
              at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
              at java.io.DataInputStream.readInt(DataInputStream.java:370)
              at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:804)
              at org.apache.hadoop.ipc.Client$Connection.run(Client.java:749)
      2013-11-04 12:29:25,654 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_4347580689772316942_201043980 bad datanode[0] xx.xx.24.115:1004
      2013-11-04 12:29:25,661 WARN org.apache.hadoop.hdfs.DFSClient: Failed recovery attempt #1 from primary datanode xx.xx.23.48:1004
              at org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:1884)
              at org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:2047)
              at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
              at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)
              at javax.security.auth.Subject.doAs(Subject.java:415)
              at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
              at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382)

              at org.apache.hadoop.ipc.Client.call(Client.java:1070)
              at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
              at $Proxy8.recoverBlock(Unknown Source)
              at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:3121)
              at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2100(DFSClient.java:2589)
              at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2793)
      2013-11-04 12:29:26,663 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_4347580689772316942_201043980 bad datanode[0] xx.xx.24.115:1004
      2013-11-04 12:29:26,669 WARN org.apache.hadoop.hdfs.DFSClient: Failed recovery attempt #2 from primary datanode xx.xx.23.48:1004
              at org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:1884)
              at org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:2047)
              at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
              at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)
              at javax.security.auth.Subject.doAs(Subject.java:415)
              at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
              at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382)

              at org.apache.hadoop.ipc.Client.call(Client.java:1070)
              at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
              at $Proxy8.recoverBlock(Unknown Source)
              at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:3121)
              at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2100(DFSClient.java:2589)
              at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2793)
      2013-11-04 12:29:27,671 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_4347580689772316942_201043980 bad datanode[0] xx.xx.24.115:1004
      2013-11-04 12:29:27,676 WARN org.apache.hadoop.hdfs.DFSClient: Failed recovery attempt #3 from primary datanode xx.xx.23.48:1004
              at org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:1884)
              at org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:2047)
              at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
              at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)
              at javax.security.auth.Subject.doAs(Subject.java:415)
              at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
              at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382)

              at org.apache.hadoop.ipc.Client.call(Client.java:1070)
              at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
              at $Proxy8.recoverBlock(Unknown Source)
              at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:3121)
              at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2100(DFSClient.java:2589)
              at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2793)
      2013-11-04 12:29:28,678 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_4347580689772316942_201043980 bad datanode[0] xx.xx.24.115:1004
      2013-11-04 12:29:28,683 WARN org.apache.hadoop.hdfs.DFSClient: Failed recovery attempt #4 from primary datanode xx.xx.23.48:1004
              at org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:1884)
              at org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:2047)
              at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
              at java.lang.reflect.Method.invoke(Method.java:601)
              at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563)
              at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)
              at org.apache.hadoop.ipc.Client.call(Client.java:1070)
              at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
              at $Proxy8.recoverBlock(Unknown Source)
              at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:3121)
              at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2100(DFSClient.java:2589)
              at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2793)
      2013-11-04 12:29:29,685 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_4347580689772316942_201043980 bad datanode[0] xx.xx.24.115:1004
              at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)
              at javax.security.auth.Subject.doAs(Subject.java:415)
              at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
              at org.apache.hadoop.ipc.Client.call(Client.java:1070)
              at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:3121)
      2013-11-04 12:30:22,883 INFO org.mortbay.log: index:34
      2013-11-04 12:30:59,778 INFO org.mortbay.log: index:34
      2013-11-04 12:31:16,161 INFO org.mortbay.log: index:34
      2013-11-04 12:30:30,692 WARN org.apache.hadoop.ipc.Client: Encountered java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be
      ready for read. ch : java.nio.channels.SocketChannel[connected local=/xx.xx.23.162:30348 remote=/xx.xx.23.82:50020] while trying to establish SASL connecti
      on to the server. Will retry SASL connection to server with principal hadoop/a082@DATA
      2013-11-04 12:30:33,375 INFO org.mortbay.log: index:34
      2013-11-04 12:30:33,771 INFO org.mortbay.log: index:34
      2013-11-04 12:30:59,778 INFO org.mortbay.log: index:34
      2013-11-04 12:31:09,851 INFO org.mortbay.log: index:40
      2013-11-04 12:31:15,345 INFO org.mortbay.log: index:34
      2013-11-04 12:31:16,161 INFO org.mortbay.log: index:34
      2013-11-04 12:31:16,635 INFO org.mortbay.log: index:34
      2013-11-04 12:31:26,504 INFO org.mortbay.log: index:40
      2013-11-04 12:31:26,521 INFO org.mortbay.log: index:40
      2013-11-04 12:31:34,674 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:blk_4347580689772316942_201043980 cause:java.net
      .SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/xx.xx.23
      .162:30438 remote=/xx.xx.23.82:50020]
      2013-11-04 12:31:34,674 WARN org.apache.hadoop.ipc.Client: Encountered java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be
      ready for read. ch : java.nio.channels.SocketChannel[connected local=/xx.xx.23.162:30438 remote=/xx.xx.23.82:50020] while trying to establish SASL connection to the server. Will retry SASL connection to server with principal hadoop/a082@DATA
      2013-11-04 12:32:37,666 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:blk_4347580689772316942_201043980 cause:java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/xx.xx.23.162:30529 remote=/xx.xx.23.82:50020]
      2013-11-04 12:32:37,666 WARN org.apache.hadoop.ipc.Client: Encountered java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to beready for read. ch : java.nio.channels.SocketChannel[connected local=/xx.xx.23.162:30529 remote=/xx.xx.23.82:50020] while trying to establish SASL connection to the server. Will retry SASL connection to server with principal hadoop/082@DATA
      Show
      jobTracker 's logs : 2013-11-04 12:29:24,651 WARN org.apache.hadoop.hdfs.DFSClient: Failed recovery attempt #0 from primary datanode xx.xx.23.48:1004         at org.apache.hadoop.ipc.Client.wrapException(Client.java:1103)         at org.apache.hadoop.ipc.Client.call(Client.java:1075)         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)         at $Proxy8.recoverBlock(Unknown Source)         at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:3121)         at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2100(DFSClient.java:2589)         at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2793)         at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)         at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)         at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)         at java.io.DataInputStream.read(DataInputStream.java:132)         at org.apache.hadoop.security.SaslInputStream.read(SaslInputStream.java:239)         at org.apache.hadoop.ipc.Client$Connection.run(Client.java:749) 2013-11-04 12:29:25,661 WARN org.apache.hadoop.hdfs.DFSClient: Failed recovery attempt #1 from primary datanode xx.xx.23.48:1004         at org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:1884)         at org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:2047)         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)         at java.lang.reflect.Method.invoke(Method.java:601)         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563)         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384)         at java.security.AccessController.doPrivileged(Native Method)         at javax.security.auth.Subject.doAs(Subject.java:415)         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382)         at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2793) 2013-11-04 12:29:26,669 WARN org.apache.hadoop.hdfs.DFSClient: Failed recovery attempt #2 from primary datanode xx.xx.23.48:1004         at org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:1884)         at org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:2047)         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)         at java.lang.reflect.Method.invoke(Method.java:601)         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563)         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384)         at java.security.AccessController.doPrivileged(Native Method)         at javax.security.auth.Subject.doAs(Subject.java:415)         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382)         at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2793) 2013-11-04 12:29:27,676 WARN org.apache.hadoop.hdfs.DFSClient: Failed recovery attempt #3 from primary datanode xx.xx.23.48:1004         at org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:1884)         at org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:2047)         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)         at java.lang.reflect.Method.invoke(Method.java:601)         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563)         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384)         at java.security.AccessController.doPrivileged(Native Method)         at javax.security.auth.Subject.doAs(Subject.java:415)         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382)         at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2793) 2013-11-04 12:29:28,678 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_4347580689772316942_201043980 bad datanode[0] xx.xx.24.115:1004 2013-11-04 12:29:28,678 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_4347580689772316942_201043980 in pipeline xx.xx.24.11 5:1004, xx.xx.23.48:1004, xx.xx.23.82:1004: bad datanode xx.xx.24.115:1004 2013-11-04 12:29:28,683 WARN org.apache.hadoop.hdfs.DFSClient: Failed recovery attempt #4 from primary datanode xx.xx .23.48:1004 org.apache.hadoop.ipc.RemoteException: java.io.IOException: Block blk_4347580689772316942_201043980 is already being recovered, ignoring this req uest to recover it.         at org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:1884)         at org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:2047)         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)         at java.lang.reflect.Method.invoke(Method.java:601)         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563)         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384)         at java.security.AccessController.doPrivileged(Native Method)         at javax.security.auth.Subject.doAs(Subject.java:415)         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382)         at org.apache.hadoop.ipc.Client.call(Client.java:1070)         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)         at $Proxy8.recoverBlock(Unknown Source)         at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:3121)         at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2100(DFSClient.java:2589)         at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2793) 2013-11-04 12:29:28,683 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_4347580689772316942_201043980 failed because recovery  from primary datanode xx.xx.23.48:1004 failed 5 times. Pipeline was xx.xx.24.115:1004, xx.xx.23.48:1004, xx.xx.23.82:1004. Will retry... 2013-11-04 12:29:29,685 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_4347580689772316942_201043980 bad datanode[0] xx.xx.24.115:1004 2013-11-04 12:29:29,685 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_4347580689772316942_201043980 in pipeline xx.xx.24.115:1004, xx.xx.23.48:1004, xx.xx.23.82:1004: bad datanode xx.xx.24.115:1004 2013-11-04 12:29:29,689 WARN org.apache.hadoop.hdfs.DFSClient: Failed recovery attempt #5 from primary datanode xx.xx.23.48:1004 org.apache.hadoop.ipc.RemoteException: java.io.IOException: Block blk_4347580689772316942_201043980 is already being recovered, ignoring this request to recover it.                                                                                                                                 2013-11-04 12:29:28,678 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_4347580689772316942_201043980 bad datanode[0] xx.xx.24.115:1004 2013-11-04 12:29:28,678 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_4347580689772316942_201043980 in pipeline xx.xx.24.115:1004, xx.xx.23.48:1004, xx.xx.23.82:1004: bad datanode xx.xx.24.115:1004 2013-11-04 12:29:28,683 WARN org.apache.hadoop.hdfs.DFSClient: Failed recovery attempt #4 from primary datanode xx.xx.23.48:1004 org.apache.hadoop.ipc.RemoteException: java.io.IOException: Block blk_4347580689772316942_201043980 is already being recovered, ignoring this request to recover it.         at org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:1884)         at org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:2047)         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)         at java.lang.reflect.Method.invoke(Method.java:601)         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563)         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384)         at java.security.AccessController.doPrivileged(Native Method)         at javax.security.auth.Subject.doAs(Subject.java:415)         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382)         at org.apache.hadoop.ipc.Client.call(Client.java:1070)         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)         at $Proxy8.recoverBlock(Unknown Source)         at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:3121)         at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2100(DFSClient.java:2589)         at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2793) 2013-11-04 12:29:28,683 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_4347580689772316942_201043980 failed because recovery f rom primary datanode xx.xx.23.48:1004 failed 5 times. Pipeline was xx.xx.24.115:1004, xx.xx.23.48:1004, xx.xx.23.82:1004. Will retry... 2013-11-04 12:29:29,685 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_4347580689772316942_201043980 bad datanode[0] xx.xx.24.115:1004 2013-11-04 12:29:29,685 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_4347580689772316942_201043980 in pipeline xx.xx.24.115:1004, xx.xx.23.48:1004, xx.xx.23.82:1004: bad datanode xx.xx.24.115:1004 2013-11-04 12:29:29,689 WARN org.apache.hadoop.hdfs.DFSClient: Failed recovery attempt #5 from primary datanode xx.xx.23.48:1004 org.apache.hadoop.ipc.RemoteException: java.io.IOException: Block blk_4347580689772316942_201043980 is already being recovered, ignoring this request to recover it.         at org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:1884)                                                                                                                                  2013-11-04 12:29:28,678 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_4347580689772316942_201043980 bad datanode[0] xx.xx.24.115:1004 2013-11-04 12:29:28,678 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_4347580689772316942_201043980 in pipeline xx.xx.24.115:1004, xx.xx.23.48:1004, xx.xx.23.82:1004: bad datanode xx.xx.24.115:1004 2013-11-04 12:29:28,683 WARN org.apache.hadoop.hdfs.DFSClient: Failed recovery attempt #4 from primary datanode xx.xx.23.48:1004 org.apache.hadoop.ipc.RemoteException: java.io.IOException: Block blk_4347580689772316942_201043980 is already being recovered, ignoring this request to recover it.         at org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:1884)         at org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:2047)         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)         at java.lang.reflect.Method.invoke(Method.java:601)         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563)         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384)         at java.security.AccessController.doPrivileged(Native Method)         at javax.security.auth.Subject.doAs(Subject.java:415)         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382)         at org.apache.hadoop.ipc.Client.call(Client.java:1070)         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)         at $Proxy8.recoverBlock(Unknown Source)         at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:3121)         at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2100(DFSClient.java:2589)         at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2793) 2013-11-04 12:29:28,683 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_4347580689772316942_201043980 failed because recovery from primary datanode xx.xx.23.48:1004 failed 5 times. Pipeline was xx.xx.24.115:1004, xx.xx.23.48:1004, xx.xx.23.82:1004. Will retry... 2013-11-04 12:29:29,685 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_4347580689772316942_201043980 bad datanode[0] xx.xx.24.115:1004 2013-11-04 12:29:29,685 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_4347580689772316942_201043980 in pipeline xx.xx.24.115:1004, xx.xx.23.48:1004, xx.xx.23.82:1004: bad datanode xx.xx.24.115:1004 2013-11-04 12:29:29,689 WARN org.apache.hadoop.hdfs.DFSClient: Failed recovery attempt #5 from primary datanode xx.xx.23.48:1004 org.apache.hadoop.ipc.RemoteException: java.io.IOException: Block blk_4347580689772316942_201043980 is already being recovered, ignoring this request to recover it. 2013-11-04 12:29:24,651 WARN org.apache.hadoop.hdfs.DFSClient: Failed recovery attempt #0 from primary datanode xx.xx.23.48:1004         at org.apache.hadoop.ipc.Client.wrapException(Client.java:1103)         at org.apache.hadoop.ipc.Client.call(Client.java:1075)         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)         at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2100(DFSClient.java:2589)         at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)         at java.io.DataInputStream.read(DataInputStream.java:132)         at org.apache.hadoop.security.SaslInputStream.read(SaslInputStream.java:239)         at java.io.FilterInputStream.read(FilterInputStream.java:116)         at org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:342)         at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)         at java.io.BufferedInputStream.read(BufferedInputStream.java:237)         at java.io.DataInputStream.readInt(DataInputStream.java:370)         at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:804)         at org.apache.hadoop.ipc.Client$Connection.run(Client.java:749) 2013-11-04 12:29:25,654 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_4347580689772316942_201043980 bad datanode[0] xx.xx.24.115:1004 2013-11-04 12:29:25,661 WARN org.apache.hadoop.hdfs.DFSClient: Failed recovery attempt #1 from primary datanode xx.xx.23.48:1004         at org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:1884)         at org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:2047)         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)         at javax.security.auth.Subject.doAs(Subject.java:415)         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382)         at org.apache.hadoop.ipc.Client.call(Client.java:1070)         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)         at $Proxy8.recoverBlock(Unknown Source)         at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:3121)         at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2100(DFSClient.java:2589)         at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2793) 2013-11-04 12:29:26,663 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_4347580689772316942_201043980 bad datanode[0] xx.xx.24.115:1004 2013-11-04 12:29:26,669 WARN org.apache.hadoop.hdfs.DFSClient: Failed recovery attempt #2 from primary datanode xx.xx.23.48:1004         at org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:1884)         at org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:2047)         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)         at javax.security.auth.Subject.doAs(Subject.java:415)         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382)         at org.apache.hadoop.ipc.Client.call(Client.java:1070)         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)         at $Proxy8.recoverBlock(Unknown Source)         at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:3121)         at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2100(DFSClient.java:2589)         at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2793) 2013-11-04 12:29:27,671 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_4347580689772316942_201043980 bad datanode[0] xx.xx.24.115:1004 2013-11-04 12:29:27,676 WARN org.apache.hadoop.hdfs.DFSClient: Failed recovery attempt #3 from primary datanode xx.xx.23.48:1004         at org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:1884)         at org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:2047)         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)         at javax.security.auth.Subject.doAs(Subject.java:415)         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382)         at org.apache.hadoop.ipc.Client.call(Client.java:1070)         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)         at $Proxy8.recoverBlock(Unknown Source)         at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:3121)         at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2100(DFSClient.java:2589)         at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2793) 2013-11-04 12:29:28,678 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_4347580689772316942_201043980 bad datanode[0] xx.xx.24.115:1004 2013-11-04 12:29:28,683 WARN org.apache.hadoop.hdfs.DFSClient: Failed recovery attempt #4 from primary datanode xx.xx.23.48:1004         at org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:1884)         at org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:2047)         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)         at java.lang.reflect.Method.invoke(Method.java:601)         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563)         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)         at org.apache.hadoop.ipc.Client.call(Client.java:1070)         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)         at $Proxy8.recoverBlock(Unknown Source)         at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:3121)         at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2100(DFSClient.java:2589)         at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2793) 2013-11-04 12:29:29,685 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_4347580689772316942_201043980 bad datanode[0] xx.xx.24.115:1004         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)         at javax.security.auth.Subject.doAs(Subject.java:415)         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)         at org.apache.hadoop.ipc.Client.call(Client.java:1070)         at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:3121) 2013-11-04 12:30:22,883 INFO org.mortbay.log: index:34 2013-11-04 12:30:59,778 INFO org.mortbay.log: index:34 2013-11-04 12:31:16,161 INFO org.mortbay.log: index:34 2013-11-04 12:30:30,692 WARN org.apache.hadoop.ipc.Client: Encountered java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/xx.xx.23.162:30348 remote=/xx.xx.23.82:50020] while trying to establish SASL connecti on to the server. Will retry SASL connection to server with principal hadoop/a082@DATA 2013-11-04 12:30:33,375 INFO org.mortbay.log: index:34 2013-11-04 12:30:33,771 INFO org.mortbay.log: index:34 2013-11-04 12:30:59,778 INFO org.mortbay.log: index:34 2013-11-04 12:31:09,851 INFO org.mortbay.log: index:40 2013-11-04 12:31:15,345 INFO org.mortbay.log: index:34 2013-11-04 12:31:16,161 INFO org.mortbay.log: index:34 2013-11-04 12:31:16,635 INFO org.mortbay.log: index:34 2013-11-04 12:31:26,504 INFO org.mortbay.log: index:40 2013-11-04 12:31:26,521 INFO org.mortbay.log: index:40 2013-11-04 12:31:34,674 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:blk_4347580689772316942_201043980 cause:java.net .SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/xx.xx.23 .162:30438 remote=/xx.xx.23.82:50020] 2013-11-04 12:31:34,674 WARN org.apache.hadoop.ipc.Client: Encountered java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/xx.xx.23.162:30438 remote=/xx.xx.23.82:50020] while trying to establish SASL connection to the server. Will retry SASL connection to server with principal hadoop/a082@DATA 2013-11-04 12:32:37,666 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:blk_4347580689772316942_201043980 cause:java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/xx.xx.23.162:30529 remote=/xx.xx.23.82:50020] 2013-11-04 12:32:37,666 WARN org.apache.hadoop.ipc.Client: Encountered java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to beready for read. ch : java.nio.channels.SocketChannel[connected local=/xx.xx.23.162:30529 remote=/xx.xx.23.82:50020] while trying to establish SASL connection to the server. Will retry SASL connection to server with principal hadoop/082@DATA

      Description

      when a datanode was crash,the server can ping ok,but can not call rpc ,and also can not ssh login. and then jobTracker may be request a block on this datanode.
      it will happened ,the JobTracker can not work,the webUI is also unwork,hadoop job -list also unwork,the jobTracker logs no other info .

      and then we need to restart the datanode.
      then jobTraker can work too,but the taskTracker num come to zero,
      we need run : hadoop mradmin -refreshNodes
      then the JobTracker begin to add taskTraker ,but is very slowly.

      this problem occur 5time in 2weeks.

        Activity

        Hide
        Chris Nauroth added a comment -

        I'm reopening this. There is an actual bug here (holding a global lock in the JT while doing I/O). Despite the config workaround I described, I don't think we can really call it resolved.

        What I'm not sure about is if this is a duplicate of MAPREDUCE-1144. If anyone on that issue can tell, then we can close this as duplicate.

        Show
        Chris Nauroth added a comment - I'm reopening this. There is an actual bug here (holding a global lock in the JT while doing I/O). Despite the config workaround I described, I don't think we can really call it resolved. What I'm not sure about is if this is a duplicate of MAPREDUCE-1144 . If anyone on that issue can tell, then we can close this as duplicate.
        Hide
        firegun added a comment -

        just set on jobTracker ,and need restart the jobTracker!

        Show
        firegun added a comment - just set on jobTracker ,and need restart the jobTracker!
        Hide
        viswanathan added a comment -

        Hi,

        Do I need to set this property only in jobtracker node mapred Conf or in
        whole hdfs cluster.

        Show
        viswanathan added a comment - Hi, Do I need to set this property only in jobtracker node mapred Conf or in whole hdfs cluster.
        Hide
        firegun added a comment -

        <property>
        <name>hadoop.job.history.user.location</name>
        <final>true</final>
        <value>none</value>
        </property>

        Show
        firegun added a comment - <property> <name>hadoop.job.history.user.location</name> <final>true</final> <value>none</value> </property>
        Hide
        firegun added a comment -

        Hi Chris,
        Thanks for your help.we have set the property hadoop.job.history.user.location to none for a week . Just as you said,It now work well.i closed this jira.

        Show
        firegun added a comment - Hi Chris, Thanks for your help.we have set the property hadoop.job.history.user.location to none for a week . Just as you said,It now work well.i closed this jira.
        Hide
        Chris Nauroth added a comment -

        That config setting can be considered a permanent fix if you're not interested in saving user history to HDFS. OTOH, if you are interested in saving user history to HDFS, then there is no permanent fix yet. That's what this issue tracks.

        Show
        Chris Nauroth added a comment - That config setting can be considered a permanent fix if you're not interested in saving user history to HDFS. OTOH, if you are interested in saving user history to HDFS, then there is no permanent fix yet. That's what this issue tracks.
        Hide
        viswanathan added a comment -

        Hi Chris,

        By adding the below property hope it will resolve temporarily, how to resolve this permanently.

        <property>
        <name>hadoop.job.history.user.location</name>
        <final>true</final>
        <value>none</value>
        </property>

        Earlier we had used Hadoop 1.0.3 and 1.0.4 versions, but we never face this type of issue. Only we are facing in hadoop version 1.2.1

        Show
        viswanathan added a comment - Hi Chris, By adding the below property hope it will resolve temporarily, how to resolve this permanently. <property> <name>hadoop.job.history.user.location</name> <final>true</final> <value>none</value> </property> Earlier we had used Hadoop 1.0.3 and 1.0.4 versions, but we never face this type of issue. Only we are facing in hadoop version 1.2.1
        Hide
        Chris Nauroth added a comment -

        I just noticed an old issue, MAPREDUCE-1144, which appears to be the same thing. Is this a duplicate?

        Show
        Chris Nauroth added a comment - I just noticed an old issue, MAPREDUCE-1144 , which appears to be the same thing. Is this a duplicate?
        Hide
        Chris Nauroth added a comment -

        I've seen this happen in more recent 1.x versions too. In my case, it happened while writing job history files to HDFS. The problem is that this occurs while holding a global lock (inside a synchronized method of the JobTracker object). This prevents the JT from getting other useful work done, like accepting new job submissions or displaying the web UI. You might be able to confirm this by inspecting a thread dump of your JT process while this is happening.

        If your investigation shows the same root cause (blocked writing history files to HDFS), then you can disable this and instead only write history to the local file system. If the configuration parameter hadoop.job.history.location is set to a location on HDFS, then remove this. (It will default to the standard Hadoop log directory on the local file system.)

        There is also hadoop.job.history.user.location. If unspecified, this will default to writing per-job history files in each job's output directory in HDFS. You can disable these files by setting the value to none, like this:

        <property>
          <name>hadoop.job.history.user.location</name>
          <final>true</final>
          <value>none</value>
        </property>
        

        To fix this issue completely, we'd need to move the logic for writing history outside of the JobTracker monitor. Really any kind of I/O performed while holding a global lock is problematic due to the risk of failure.

        Show
        Chris Nauroth added a comment - I've seen this happen in more recent 1.x versions too. In my case, it happened while writing job history files to HDFS. The problem is that this occurs while holding a global lock (inside a synchronized method of the JobTracker object). This prevents the JT from getting other useful work done, like accepting new job submissions or displaying the web UI. You might be able to confirm this by inspecting a thread dump of your JT process while this is happening. If your investigation shows the same root cause (blocked writing history files to HDFS), then you can disable this and instead only write history to the local file system. If the configuration parameter hadoop.job.history.location is set to a location on HDFS, then remove this. (It will default to the standard Hadoop log directory on the local file system.) There is also hadoop.job.history.user.location. If unspecified, this will default to writing per-job history files in each job's output directory in HDFS. You can disable these files by setting the value to none, like this: <property> <name>hadoop.job.history.user.location</name> < final > true </ final > <value>none</value> </property> To fix this issue completely, we'd need to move the logic for writing history outside of the JobTracker monitor. Really any kind of I/O performed while holding a global lock is problematic due to the risk of failure.
        Hide
        firegun added a comment -

        thanks for your reply, f we hava a plan upgrade the hadoop to 2.0.X next month,but there now we need to do some test first.because the version on the production is modify base on 1.0.3.

        i think 1.0.3 version will last at least for a month.

        before this time ,the server work very well more then 1 years.

        now i can do is monitor the jobTracker log,when found some datanode was crash,then add this datanode to hostexclude.

        can u give me some idea, or why it happened?

        need i turn down the socket timeout ,fail time?

        thanks

        Show
        firegun added a comment - thanks for your reply, f we hava a plan upgrade the hadoop to 2.0.X next month,but there now we need to do some test first.because the version on the production is modify base on 1.0.3. i think 1.0.3 version will last at least for a month. before this time ,the server work very well more then 1 years. now i can do is monitor the jobTracker log,when found some datanode was crash,then add this datanode to hostexclude. can u give me some idea, or why it happened? need i turn down the socket timeout ,fail time? thanks
        Hide
        Steve Loughran added a comment -

        Does this still happen when you upgrade to the most recent version of Hadoop 1.x (or better yet, Hadoop 2.2?)

        Show
        Steve Loughran added a comment - Does this still happen when you upgrade to the most recent version of Hadoop 1.x (or better yet, Hadoop 2.2?)

          People

          • Assignee:
            Unassigned
            Reporter:
            firegun
          • Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Due:
              Created:
              Updated:

              Development