Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-1970

Null pointer exception comes when Namenode recovery happens and there is no response from client to NN more than the hardlimit for NN recovery and the current block is more than the prev block size in NN

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • 0.20-append
    • 0.20-append
    • namenode
    • None
    • Reviewed

    Description

      Null pointer exception comes when Namenode recovery happens and there is no response from client to NN more than the hardlimit for NN recovery and the current block is more than the prev block size in NN
      1. Write using a client to 2 datanodes
      2. Kill one data node and allow pipeline recovery.
      3. write somemore data to the same block
      4. Parallely allow the namenode recovery to happen
      Null pointer exception will come in addStoreBlock api.

      Pls find the logs

      Debugging in MachineName..
      Listening for transport dt_socket at address: 8007
      11/05/20 21:38:33 INFO namenode.NameNode: STARTUP_MSG:
      /************************************************************
      STARTUP_MSG: Starting NameNode
      STARTUP_MSG: host = linux76/10.18.52.76
      STARTUP_MSG: args = []
      STARTUP_MSG: version = 0.20.3-SNAPSHOT
      STARTUP_MSG: build = -r ; compiled by 'G00900856' on Tue Feb 1 11:40:14 IST 2011
      ************************************************************/
      11/05/20 21:38:33 INFO metrics.RpcMetrics: Initializing RPC Metrics with hostName=NameNode, port=9000
      11/05/20 21:38:33 INFO namenode.NameNode: Namenode up at: linux76/10.18.52.76:9000
      11/05/20 21:38:33 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=NameNode, sessionId=null
      11/05/20 21:38:33 INFO metrics.NameNodeMetrics: Initializing NameNodeMeterics using context object:org.apache.hadoop.metrics.spi.NullContext
      11/05/20 21:38:33 INFO namenode.FSNamesystem: fsOwner=root,root
      11/05/20 21:38:33 INFO namenode.FSNamesystem: supergroup=supergroup
      11/05/20 21:38:33 INFO namenode.FSNamesystem: isPermissionEnabled=false
      11/05/20 21:38:33 INFO metrics.FSNamesystemMetrics: Initializing FSNamesystemMetrics using context object:org.apache.hadoop.metrics.spi.NullContext
      11/05/20 21:38:33 INFO namenode.FSNamesystem: Registered FSNamesystemStatusMBean
      11/05/20 21:38:33 INFO common.Storage: Number of files = 1
      11/05/20 21:38:33 INFO common.Storage: Number of files under construction = 0
      11/05/20 21:38:33 INFO common.Storage: Image file of size 94 loaded in 0 seconds.
      11/05/20 21:38:33 INFO common.Storage: Edits file /home/ramkrishna/opensrchadoop/appendbranch/hadoop-0.20.3-SNAPSHOT/bin/../hadoop-root/dfs/name/current/edits of size 4 edits # 0 loaded in 0 seconds.
      11/05/20 21:38:33 INFO common.Storage: Image file of size 94 saved in 0 seconds.
      11/05/20 21:38:34 INFO common.Storage: Image file of size 94 saved in 0 seconds.
      11/05/20 21:38:34 INFO namenode.FSNamesystem: Finished loading FSImage in 482 msecs
      11/05/20 21:38:34 INFO namenode.FSNamesystem: Total number of blocks = 0
      11/05/20 21:38:34 INFO namenode.FSNamesystem: Number of invalid blocks = 0
      11/05/20 21:38:34 INFO namenode.FSNamesystem: Number of under-replicated blocks = 0
      11/05/20 21:38:34 INFO namenode.FSNamesystem: Number of over-replicated blocks = 0
      11/05/20 21:38:34 INFO hdfs.StateChange: STATE* Leaving safe mode after 0 secs.
      11/05/20 21:38:34 INFO hdfs.StateChange: STATE* Network topology has 0 racks and 0 datanodes
      11/05/20 21:38:34 INFO hdfs.StateChange: STATE* UnderReplicatedBlocks has 0 blocks
      11/05/20 21:38:34 INFO mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
      11/05/20 21:38:35 INFO http.HttpServer: Port returned by webServer.getConnectors()[0].getLocalPort() before open() is -1. Opening the listener on 50070
      11/05/20 21:38:35 INFO http.HttpServer: listener.getLocalPort() returned 50070 webServer.getConnectors()[0].getLocalPort() returned 50070
      11/05/20 21:38:35 INFO http.HttpServer: Jetty bound to port 50070
      11/05/20 21:38:35 INFO mortbay.log: jetty-6.1.14
      11/05/20 21:38:37 INFO mortbay.log: Started SelectChannelConnector@linux76:50070
      11/05/20 21:38:37 INFO namenode.NameNode: Web-server up at: linux76:50070
      11/05/20 21:38:37 INFO ipc.Server: IPC Server Responder: starting
      11/05/20 21:38:37 INFO ipc.Server: IPC Server listener on 9000: starting
      11/05/20 21:38:37 INFO ipc.Server: IPC Server handler 0 on 9000: starting
      11/05/20 21:38:37 INFO ipc.Server: IPC Server handler 1 on 9000: starting
      11/05/20 21:38:37 INFO ipc.Server: IPC Server handler 2 on 9000: starting
      11/05/20 21:38:37 INFO ipc.Server: IPC Server handler 3 on 9000: starting
      11/05/20 21:38:37 INFO ipc.Server: IPC Server handler 4 on 9000: starting
      11/05/20 21:38:37 INFO ipc.Server: IPC Server handler 5 on 9000: starting
      11/05/20 21:38:37 INFO ipc.Server: IPC Server handler 6 on 9000: starting
      11/05/20 21:38:37 INFO ipc.Server: IPC Server handler 7 on 9000: starting
      11/05/20 21:38:37 INFO ipc.Server: IPC Server handler 8 on 9000: starting
      11/05/20 21:38:37 INFO ipc.Server: IPC Server handler 9 on 9000: starting
      11/05/20 21:38:38 INFO hdfs.StateChange: BLOCK* NameSystem.registerDatanode: node registration from 10.18.52.76:50010 storage DS-1868335495-10.18.52.76-50010-1305907718956
      11/05/20 21:38:38 INFO net.NetworkTopology: Adding a new node: /default-rack/10.18.52.76:50010
      11/05/20 21:38:41 INFO hdfs.StateChange: BLOCK* NameSystem.registerDatanode: node registration from 10.18.52.5:50010 storage DS-721814011-10.18.52.5-50010-1305928120116
      11/05/20 21:38:41 INFO net.NetworkTopology: Adding a new node: /default-rack/10.18.52.5:50010
      11/05/20 21:38:54 INFO FSNamesystem.audit: ugi=R00902313,Tardis ip=/10.18.47.133 cmd=create src=/synctest0 dst=null perm=R00902313:supergroup:rw-r-r-
      11/05/20 21:38:54 INFO hdfs.StateChange: BLOCK* NameSystem.allocateBlock: /synctest0. blk_-8455334393673385140_1001
      11/05/20 21:38:55 INFO hdfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 10.18.52.5:50010 is added to blk_-8455334393673385140_1001 size 1024
      11/05/20 21:38:55 INFO hdfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 10.18.52.76:50010 is added to blk_-8455334393673385140_1001 size 1024
      11/05/20 21:38:55 INFO hdfs.StateChange: BLOCK* NameSystem.allocateBlock: /synctest0. blk_8955752156241965528_1001
      11/05/20 21:38:55 INFO hdfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 10.18.52.76:50010 is added to blk_8955752156241965528_1001 size 1024
      11/05/20 21:38:55 INFO hdfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 10.18.52.5:50010 is added to blk_8955752156241965528_1001 size 1024
      11/05/20 21:39:11 INFO hdfs.StateChange: BLOCK* NameSystem.allocateBlock: /synctest0. blk_-5249191180607448701_1001
      11/05/20 21:40:18 INFO namenode.FSNamesystem: commitBlockSynchronization(lastblock=blk_-5249191180607448701_1001, newgenerationstamp=1002, newlength=833, newtargets=[10.18.52.76:50010], closeFile=false, deleteBlock=false)
      11/05/20 21:40:18 INFO namenode.FSNamesystem: Number of transactions: 4 Total time for transactions(ms): 2Number of transactions batched in Syncs: 0 Number of syncs: 1 SyncTimes(ms): 28
      11/05/20 21:40:18 INFO namenode.FSNamesystem: commitBlockSynchronization(blk_-5249191180607448701_1002) successful
      11/05/20 21:41:41 INFO namenode.LeaseManager: Lease [Lease. Holder: DFSClient_-676883980, pendingcreates: 1] has expired hard limit
      11/05/20 21:41:56 INFO namenode.FSNamesystem: Recovering lease=[Lease. Holder: DFSClient_-676883980, pendingcreates: 1], src=/synctest0
      11/05/20 21:41:56 INFO hdfs.StateChange: BLOCK* blk_-5249191180607448701_1002 recovery started, primary=10.18.52.76:50010
      11/05/20 21:42:09 INFO namenode.FSNamesystem: commitBlockSynchronization(lastblock=blk_-5249191180607448701_1002, newgenerationstamp=1003, newlength=867, newtargets=[10.18.52.76:50010], closeFile=true, deleteBlock=false)
      11/05/20 21:42:26 INFO hdfs.StateChange: BLOCK* NameSystem.addStoredBlock: addStoredBlock request received for blk_-5249191180607448701_1003 on 10.18.52.76:50010 size 867 and it belongs to a file under construction.
      11/05/20 21:42:52 WARN namenode.FSNamesystem: Inconsistent size for block blk_-5249191180607448701_1003 reported from 10.18.52.76:50010 current size is 833 reported size is 867
      11/05/20 21:43:11 WARN namenode.FSNamesystem: Block blk_-5249191180607448701_1003 reported from 10.18.52.76:50010 does not exist in blockMap. Surprise! Surprise!
      11/05/20 21:43:13 INFO hdfs.StateChange: Removing lease on file /synctest0 from client NN_Recovery
      11/05/20 21:43:13 INFO hdfs.StateChange: BLOCK* ask 10.18.52.76:50010 to replicate blk_-5249191180607448701_1003 to datanode(s) 10.18.52.5:50010
      11/05/20 21:43:13 INFO namenode.FSNamesystem: Number of transactions: 6 Total time for transactions(ms): 2Number of transactions batched in Syncs: 0 Number of syncs: 2 SyncTimes(ms): 30
      11/05/20 21:43:13 INFO namenode.FSNamesystem: commitBlockSynchronization(newblock=blk_-5249191180607448701_1003, file=/synctest0, newgenerationstamp=1003, newlength=867, newtargets=[10.18.52.76:50010]) successful
      11/05/20 21:43:15 INFO ipc.Server: IPC Server handler 6 on 9000, call blockReceived(DatanodeRegistration(10.18.52.76:50010, storageID=DS-1868335495-10.18.52.76-50010-1305907718956, infoPort=50075, ipcPort=50020), [Lorg.apache.hadoop.hdfs.protocol.Block;@9b774e, [Ljava.lang.String;@b5d05b) from 10.18.52.76:40343: error: java.io.IOException: java.lang.NullPointerException
      java.io.IOException: java.lang.NullPointerException
      at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.addStoredBlock(FSNamesystem.java:3173)
      at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.blockReceived(FSNamesystem.java:3592)
      at org.apache.hadoop.hdfs.server.namenode.NameNode.blockReceived(NameNode.java:756)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
      at java.lang.reflect.Method.invoke(Method.java:597)
      at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
      at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:961)
      at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:957)
      at java.security.AccessController.doPrivileged(Native Method)
      at javax.security.auth.Subject.doAs(Subject.java:396)
      at org.apache.hadoop.ipc.Server$Handler.run(Server.java:955)
      11/05/20 21:43:19 WARN hdfs.StateChange: BLOCK* NameSystem.addStoredBlock: Redundant addStoredBlock request received for blk_-5249191180607448701_1003 on 10.18.52.76:50010 size 867
      11/05/20 21:43:19 INFO hdfs.StateChange: BLOCK* ask 10.18.52.76:50010 to replicate blk_-5249191180607448701_1003 to datanode(s) 10.18.52.5:50010
      Listening for transport dt_socket at address: 8007
      11/05/20 21:43:22 INFO hdfs.StateChange: BLOCK* ask 10.18.52.76:50010 to replicate blk_-5249191180607448701_1003 to datanode(s) 10.18.52.5:50010

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              ram_krish ramkrishna.s.vasudevan
              Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: