Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-4882

Prevent the Namenode's LeaseManager from looping forever in checkLeases

    Details

    • Target Version/s:

      Description

      Scenario:
      1. cluster with 4 DNs
      2. the size of the file to be written is a little more than one block
      3. write the first block to 3 DNs, DN1->DN2->DN3
      4. all the data packets of first block is successfully acked and the client sets the pipeline stage to PIPELINE_CLOSE, but the last packet isn't sent out
      5. DN2 and DN3 are down
      6. client recovers the pipeline, but no new DN is added to the pipeline because of the current pipeline stage is PIPELINE_CLOSE
      7. client continuously writes the last block, and try to close the file after written all the data
      8. NN finds that the penultimate block doesn't has enough replica(our dfs.namenode.replication.min=2), and the client's close runs into indefinite loop(HDFS-2936), and at the same time, NN makes the last block's state to COMPLETE
      9. shutdown the client
      10. the file's lease exceeds hard limit
      11. LeaseManager realizes that and begin to do lease recovery by call fsnamesystem.internalReleaseLease()
      12. but the last block's state is COMPLETE, and this triggers lease manager's infinite loop and prints massive logs like this:

      2013-06-05,17:42:25,695 INFO org.apache.hadoop.hdfs.server.namenode.LeaseManager: Lease [Lease.  Holder: DFSClient_NONMAPREDUCE_-1252656407_1, pendingcreates: 1] has expired hard
       limit
      2013-06-05,17:42:25,695 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering lease=[Lease.  Holder: DFSClient_NONMAPREDUCE_-1252656407_1, pendingcreates: 1], src=
      /user/h_wuzesheng/test.dat
      2013-06-05,17:42:25,695 WARN org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.internalReleaseLease: File = /user/h_wuzesheng/test.dat, block blk_-7028017402720175688_1202597,
      lastBLockState=COMPLETE
      2013-06-05,17:42:25,695 INFO org.apache.hadoop.hdfs.server.namenode.LeaseManager: Started block recovery for file /user/h_wuzesheng/test.dat lease [Lease.  Holder: DFSClient_NONM
      APREDUCE_-1252656407_1, pendingcreates: 1]
      

      (the 3rd line log is a debug log added by us)

        Attachments

        1. 4882.1.patch
          3 kB
          Zesheng Wu
        2. 4882.patch
          1 kB
          Zesheng Wu
        3. 4882.patch
          1 kB
          Zesheng Wu
        4. HDFS-4882.patch
          4 kB
          Ravi Prakash
        5. HDFS-4882.1.patch
          6 kB
          Ravi Prakash
        6. HDFS-4882.2.patch
          9 kB
          Ravi Prakash
        7. HDFS-4882.3.patch
          9 kB
          Ravi Prakash
        8. HDFS-4882.4.patch
          9 kB
          Ravi Prakash
        9. HDFS-4882.5.patch
          9 kB
          Ravi Prakash
        10. HDFS-4882.6.patch
          9 kB
          Ravi Prakash
        11. HDFS-4882.7.patch
          9 kB
          Ravi Prakash

          Issue Links

            Activity

              People

              • Assignee:
                raviprak Ravi Prakash
                Reporter:
                wuzesheng Zesheng Wu
              • Votes:
                1 Vote for this issue
                Watchers:
                28 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: