Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-14498

LeaseManager can loop forever on the file for which create has failed

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.9.0
    • 3.2.2, 2.10.1, 3.3.1, 3.4.0
    • namenode
    • None
    • Reviewed

    Description

      The logs from file creation are long gone due to infinite lease logging, however it presumably failed... the client who was trying to write this file is definitely long dead.
      The version includes HDFS-4882.
      We get this log pattern repeating infinitely:

      2019-05-16 14:00:16,893 INFO [org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor@b27557f] org.apache.hadoop.hdfs.server.namenode.LeaseManager: [Lease.  Holder: DFSClient_NONMAPREDUCE_-20898906_61, pending creates: 1] has expired hard limit
      2019-05-16 14:00:16,893 INFO [org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor@b27557f] org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering [Lease.  Holder: DFSClient_NONMAPREDUCE_-20898906_61, pending creates: 1], src=<snip>
      2019-05-16 14:00:16,893 WARN [org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor@b27557f] org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.internalReleaseLease: Failed to release lease for file <snip>. Committed blocks are waiting to be minimally replicated. Try again later.
      2019-05-16 14:00:16,893 WARN [org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor@b27557f] org.apache.hadoop.hdfs.server.namenode.LeaseManager: Cannot release the path <snip> in the lease [Lease.  Holder: DFSClient_NONMAPREDUCE_-20898906_61, pending creates: 1]. It will be retried.
      org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: DIR* NameSystem.internalReleaseLease: Failed to release lease for file <snip>. Committed blocks are waiting to be minimally replicated. Try again later.
      	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.internalReleaseLease(FSNamesystem.java:3357)
      	at org.apache.hadoop.hdfs.server.namenode.LeaseManager.checkLeases(LeaseManager.java:573)
      	at org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor.run(LeaseManager.java:509)
      	at java.lang.Thread.run(Thread.java:745)
      
      
      
      $  grep -c "Recovering.*DFSClient_NONMAPREDUCE_-20898906_61, pending creates: 1" hdfs_nn*
      hdfs_nn.log:1068035
      hdfs_nn.log.2019-05-16-14:1516179
      hdfs_nn.log.2019-05-16-15:1538350
      

      Aside from an actual bug fix, it might make sense to make LeaseManager not log so much, in case if there are more bugs like this...

      Attachments

        1. HDFS-14498.001.patch
          5 kB
          Stephen O'Donnell
        2. HDFS-14498.002.patch
          7 kB
          Stephen O'Donnell
        3. HDFS-14498-branch-2.10.001.patch
          7 kB
          Xiaoqiao He
        4. HDFS-14498-branch-2.10.002.patch
          7 kB
          Xiaoqiao He
        5. HDFS-14498-branch-2.10.002.patch
          7 kB
          Xiaoqiao He
        6. HDFS-14498-branch-2.10.003.patch
          7 kB
          Xiaoqiao He
        7. HDFS-14498-branch-2.10.004.patch
          7 kB
          Stephen O'Donnell
        8. HDFS-14498-branch-3.1.001.patch
          7 kB
          Xiaoqiao He
        9. HDFS-14498-branch-3.1.001.patch
          7 kB
          Xiaoqiao He
        10. HDFS-14498-branch-3.1.001.patch
          7 kB
          Xiaoqiao He
        11. HDFS-14498-branch-3.2.001.patch
          7 kB
          Xiaoqiao He

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            sodonnell Stephen O'Donnell
            sershe Sergey Shelukhin
            Votes:
            0 Vote for this issue
            Watchers:
            24 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment