Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-8344

NameNode doesn't recover lease for files with missing blocks

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Patch Available
    • Major
    • Resolution: Unresolved
    • 2.7.0
    • None
    • namenode
    • None
    • Allow a configuration to specify the maximum number of recovery attempts for blocks under construction.

    Description

      I found another(?) instance in which the lease is not recovered. This is reproducible easily on a pseudo-distributed single node cluster

      1. Before you start it helps if you set. This is not necessary, but simply reduces how long you have to wait
              public static final long LEASE_SOFTLIMIT_PERIOD = 30 * 1000;
              public static final long LEASE_HARDLIMIT_PERIOD = 2 * LEASE_SOFTLIMIT_PERIOD;
        
      2. Client starts to write a file. (could be less than 1 block, but it hflushed so some of the data has landed on the datanodes) (I'm copying the client code I am using. I generate a jar and run it using $ hadoop jar TestHadoop.jar)
      3. Client crashes. (I simulate this by kill -9 the $(hadoop jar TestHadoop.jar) process after it has printed "Wrote to the bufferedWriter"
      4. Shoot the datanode. (Since I ran on a pseudo-distributed cluster, there was only 1)

      I believe the lease should be recovered and the block should be marked missing. However this is not happening. The lease is never recovered.
      The effect of this bug for us was that nodes could not be decommissioned cleanly. Although we knew that the client had crashed, the Namenode never released the leases (even after restarting the Namenode) (even months afterwards). There are actually several other cases too where we don't consider what happens if ALL the datanodes die while the file is being written, but I am going to punt on that for another time.

      Attachments

        1. HDFS-8344.01.patch
          7 kB
          Ravi Prakash
        2. HDFS-8344.02.patch
          7 kB
          Ravi Prakash
        3. HDFS-8344.03.patch
          8 kB
          Ravi Prakash
        4. HDFS-8344.04.patch
          8 kB
          Ravi Prakash
        5. HDFS-8344.05.patch
          8 kB
          Ravi Prakash
        6. HDFS-8344.06.patch
          11 kB
          Ravi Prakash
        7. HDFS-8344.07.patch
          12 kB
          Ravi Prakash
        8. HDFS-8344.08.patch
          8 kB
          Ravi Prakash
        9. HDFS-8344.09.patch
          11 kB
          Ravi Prakash
        10. HDFS-8344.10.patch
          10 kB
          Ravi Prakash
        11. TestHadoop.java
          1 kB
          Ravi Prakash

        Issue Links

          Activity

            People

              Unassigned Unassigned
              raviprak Ravi Prakash
              Votes:
              0 Vote for this issue
              Watchers:
              31 Start watching this issue

              Dates

                Created:
                Updated: