Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-11817

A faulty node can cause a lease leak and NPE on accessing data

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 2.8.0
    • Fix Version/s: 2.9.0, 3.0.0-alpha4, 2.8.2
    • Component/s: None
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      When the namenode performs a lease recovery for a failed write, the commitBlockSynchronization() will fail, if none of the new target has sent a received-IBR. At this point, the data is inaccessible, as the namenode will throw a NullPointerException upon getBlockLocations().

      The lease recovery will be retried in about an hour by the namenode. If the nodes are faulty (usually when there is only one new target), they may not block report until this point. If this happens, lease recovery throws an AlreadyBeingCreatedException, which causes LeaseManager to simply remove the lease without finalizing the inode.

      This results in an inconsistent lease state. The inode stays under-construction, but no more lease recovery is attempted. A manual lease recovery is also not allowed.

        Attachments

        1. hdfs-11817_supplement.txt
          7 kB
          Kihwal Lee
        2. HDFS-11817.branch-2.patch
          17 kB
          Kihwal Lee
        3. HDFS-11817.v2.branch-2.patch
          16 kB
          Kihwal Lee
        4. HDFS-11817.v2.trunk.patch
          13 kB
          Kihwal Lee
        5. HDFS-11817.v2.branch-2.8.patch
          16 kB
          Kihwal Lee

          Activity

            People

            • Assignee:
              kihwal Kihwal Lee
              Reporter:
              kihwal Kihwal Lee
            • Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: