Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-1982

Null pointer exception is thrown when NN restarts with a block lesser in size than the block that is present in DN1 but the generation stamp is greater in the NN

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 0.20-append
    • Fix Version/s: 0.20-append
    • Component/s: namenode
    • Labels:
      None
    • Environment:

      Linux

      Description

      Conisder the following scenario.
      WE have a cluster with one NN and 2 DN.

      We write some file.

      One of the block is written in DN1 but not yet completed in DN2 local disk.

      Now DN1 gets killed and so pipeline recovery happens for the block with the size as in DN2 but the generation stamp gets updated in the NN.

      DN2 also gets killed.

      Now restart NN and DN1
      Now if NN restarts, the block that NN has greater time stamp but the size is lesser in the NN.

      This leads to Null pointer exception in addstoredblock api

        Activity

        Hide
        dhruba borthakur added a comment -

        Theoretically, this seems plausible. are you experiencing it? do u have a unit test that I can use to reliably reproduce this problem?

        Show
        dhruba borthakur added a comment - Theoretically, this seems plausible. are you experiencing it? do u have a unit test that I can use to reliably reproduce this problem?
        Hide
        ramkrishna.s.vasudevan added a comment -
        package org.apache.hadoop.hdfs.server.namenode;
        
        import static org.junit.Assert.assertTrue;
        
        import java.io.IOException;
        import java.io.OutputStream;
        import java.util.ArrayList;
        import java.util.List;
        
        import org.apache.hadoop.conf.Configuration;
        import org.apache.hadoop.fs.FSDataOutputStream;
        import org.apache.hadoop.fs.FileSystem;
        import org.apache.hadoop.fs.Path;
        import org.apache.hadoop.hdfs.MiniDFSCluster;
        import org.apache.hadoop.hdfs.protocol.Block;
        import org.apache.hadoop.hdfs.protocol.LocatedBlock;
        import org.apache.hadoop.hdfs.protocol.LocatedBlocks;
        import org.junit.Test;
        
        public class TestAddStoredBlockForNPE {
          
          @Test
          public void testaddStoredBlockShouldInvalidateStaleBlock() throws Exception {
            MiniDFSCluster cluster = null;
            List<String> list = new ArrayList<String>();
            list.add("test");
            try {
              Configuration conf = new Configuration();
              conf.setInt ( "dfs.block.size", 1024 );
              cluster = new MiniDFSCluster(conf, 2, true, null);
              FileSystem dfs = cluster.getFileSystem();
              String fname = "/test";
              FSDataOutputStream fsdataout = dfs.create(new Path(fname));
              int fileLen = 10 * 1024+94;
              write(fsdataout, 0, fileLen);
              FSNamesystem namesystem = cluster.getNameNode().namesystem;
        
              LocatedBlocks blockLocations = cluster.getNameNode().getBlockLocations(
        	  fname, 0, fileLen);
              List<LocatedBlock> blockList = blockLocations.getLocatedBlocks();
              Block block = blockList.get(blockList.size() - 1).getBlock();
         
              Block block1 = new Block();
              block1.setBlockId(block.getBlockId());
              block1.setGenerationStamp(block.getGenerationStamp() - 10);
              block1.setNumBytes(block.getNumBytes() + 10);
        
              namesystem.blockReceived(cluster.getDataNodes().get(1).dnRegistration,
        	  block1, null);
              dfs.close();
              list.remove(0);
            }
        
            finally {
              if (null != cluster) {
        	cluster.shutdown();
        	
              }
              assertTrue("The flow should have executed without nullpointer exception",
        	  list.size() == 0);
            }
          }
        
          private static void write(OutputStream out, int offset, int length)
              throws IOException {
            final byte[] bytes = new byte[length];
            for (int i = 0; i < length; i++) {
              bytes[i] = (byte) (offset + i);
            }
            out.write(bytes);
          }
        }
        
        Show
        ramkrishna.s.vasudevan added a comment - package org.apache.hadoop.hdfs.server.namenode; import static org.junit.Assert.assertTrue; import java.io.IOException; import java.io.OutputStream; import java.util.ArrayList; import java.util.List; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FSDataOutputStream; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.hdfs.MiniDFSCluster; import org.apache.hadoop.hdfs.protocol.Block; import org.apache.hadoop.hdfs.protocol.LocatedBlock; import org.apache.hadoop.hdfs.protocol.LocatedBlocks; import org.junit.Test; public class TestAddStoredBlockForNPE { @Test public void testaddStoredBlockShouldInvalidateStaleBlock() throws Exception { MiniDFSCluster cluster = null; List<String> list = new ArrayList<String>(); list.add("test"); try { Configuration conf = new Configuration(); conf.setInt ( "dfs.block.size", 1024 ); cluster = new MiniDFSCluster(conf, 2, true, null); FileSystem dfs = cluster.getFileSystem(); String fname = "/test"; FSDataOutputStream fsdataout = dfs.create(new Path(fname)); int fileLen = 10 * 1024+94; write(fsdataout, 0, fileLen); FSNamesystem namesystem = cluster.getNameNode().namesystem; LocatedBlocks blockLocations = cluster.getNameNode().getBlockLocations( fname, 0, fileLen); List<LocatedBlock> blockList = blockLocations.getLocatedBlocks(); Block block = blockList.get(blockList.size() - 1).getBlock(); Block block1 = new Block(); block1.setBlockId(block.getBlockId()); block1.setGenerationStamp(block.getGenerationStamp() - 10); block1.setNumBytes(block.getNumBytes() + 10); namesystem.blockReceived(cluster.getDataNodes().get(1).dnRegistration, block1, null); dfs.close(); list.remove(0); } finally { if (null != cluster) { cluster.shutdown(); } assertTrue("The flow should have executed without nullpointer exception", list.size() == 0); } } private static void write(OutputStream out, int offset, int length) throws IOException { final byte[] bytes = new byte[length]; for (int i = 0; i < length; i++) { bytes[i] = (byte) (offset + i); } out.write(bytes); } }
        Hide
        ramkrishna.s.vasudevan added a comment -

        In this
        if (storedBlock != null
        && storedBlock.getINode() != null
        && (storedBlock.getGenerationStamp() <= block.getGenerationStamp() || storedBlock
        .getINode().isUnderConstruction()))
        What is the significance to check if INode is under construction?
        Only the generationtimestamp check may be enough?

        Kindly provide your comments.

        Show
        ramkrishna.s.vasudevan added a comment - In this if (storedBlock != null && storedBlock.getINode() != null && (storedBlock.getGenerationStamp() <= block.getGenerationStamp() || storedBlock .getINode().isUnderConstruction())) What is the significance to check if INode is under construction? Only the generationtimestamp check may be enough? Kindly provide your comments.

          People

          • Assignee:
            Unassigned
            Reporter:
            ramkrishna.s.vasudevan
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:

              Development