Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-11817

A faulty node can cause a lease leak and NPE on accessing data

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 2.8.0
    • 2.9.0, 3.0.0-alpha4, 2.8.2
    • None
    • None
    • Reviewed

    Description

      When the namenode performs a lease recovery for a failed write, the commitBlockSynchronization() will fail, if none of the new target has sent a received-IBR. At this point, the data is inaccessible, as the namenode will throw a NullPointerException upon getBlockLocations().

      The lease recovery will be retried in about an hour by the namenode. If the nodes are faulty (usually when there is only one new target), they may not block report until this point. If this happens, lease recovery throws an AlreadyBeingCreatedException, which causes LeaseManager to simply remove the lease without finalizing the inode.

      This results in an inconsistent lease state. The inode stays under-construction, but no more lease recovery is attempted. A manual lease recovery is also not allowed.

      Attachments

        1. HDFS-11817.branch-2.7.001.patch
          15 kB
          Wei-Chiu Chuang
        2. HDFS-11817.v2.branch-2.8.patch
          16 kB
          Kihwal Lee
        3. HDFS-11817.v2.trunk.patch
          13 kB
          Kihwal Lee
        4. HDFS-11817.v2.branch-2.patch
          16 kB
          Kihwal Lee
        5. HDFS-11817.branch-2.patch
          17 kB
          Kihwal Lee
        6. hdfs-11817_supplement.txt
          7 kB
          Kihwal Lee

        Issue Links

          Activity

            People

              kihwal Kihwal Lee
              kihwal Kihwal Lee
              Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: