Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-4212

NameNode can't differentiate between a never-created block and a block which is really missing

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Duplicate
    • Affects Version/s: 1.2.0, 3.0.0
    • Fix Version/s: None
    • Component/s: namenode
    • Labels:
      None

      Description

      In one test case, NameNode allocated a block and then was killed before the client got the addBlock response.

      After NameNode restarted, the block which was never created was considered as a missing block and FSCK would report the file is corrupted.

      The problem seems to be that, NameNode can't differentiate between a never-created block and a block which is really missing.

        Issue Links

          Activity

          Hide
          Brandon Li added a comment -

          Konstantin, let me resolve this JIRA as a dup of HDFS-4452. Please feel free to work a full patch to HDFS-4452 since two people working on a small issue will only slow down the progress.

          Show
          Brandon Li added a comment - Konstantin, let me resolve this JIRA as a dup of HDFS-4452 . Please feel free to work a full patch to HDFS-4452 since two people working on a small issue will only slow down the progress.
          Hide
          Konstantin Shvachko added a comment -

          Brandon I checked both issues. They seem to be dealing with the consequences of the problem.
          From what I hear Sanjay's assessment is right.
          The closest by description issue I found so far is HDFS-3031.

          Let's use HDFS-4452 since it addresses the problem directly.
          I want to make a patch ready this week. LMK if you can work on it now.

          To accelerate we can split the work. I have getAdditionalBlock() impl almost ready. But there is much more: RPC, DFSClient, and tests. I can get you a diff of FSNamesystem. You can merge it and submit the patch under HDFS-4452.

          Show
          Konstantin Shvachko added a comment - Brandon I checked both issues. They seem to be dealing with the consequences of the problem. From what I hear Sanjay's assessment is right. The closest by description issue I found so far is HDFS-3031 . Let's use HDFS-4452 since it addresses the problem directly. I want to make a patch ready this week. LMK if you can work on it now. To accelerate we can split the work. I have getAdditionalBlock() impl almost ready. But there is much more: RPC, DFSClient, and tests. I can get you a diff of FSNamesystem. You can merge it and submit the patch under HDFS-4452 .
          Hide
          Brandon Li added a comment -

          not HDFS-4280, it's HDFS-4208.

          Show
          Brandon Li added a comment - not HDFS-4280 , it's HDFS-4208 .
          Hide
          Brandon Li added a comment -

          @Konstantin, never-created block is the block with the blockId assigned by NameNode but never created on datanode. NameNode reports this block as missing, and can't delete it from any DataNode. I mentioned this in HDFS-4212 and HDFS-4280, but maybe I should have made it more obvious in the description.

          This could happen when the client couldn't get the addBlock() response from NameNode.

          I am totally OK if you want to use HDFS-4452 to track this problem since it has much clearer description.

          Show
          Brandon Li added a comment - @Konstantin, never-created block is the block with the blockId assigned by NameNode but never created on datanode. NameNode reports this block as missing, and can't delete it from any DataNode. I mentioned this in HDFS-4212 and HDFS-4280 , but maybe I should have made it more obvious in the description. This could happen when the client couldn't get the addBlock() response from NameNode. I am totally OK if you want to use HDFS-4452 to track this problem since it has much clearer description.
          Hide
          Suresh Srinivas added a comment -

          If you knew the problem since November why didn't you report it correctly, it would have saved me and others a lot of time...
          How should I understand you were talking about getAdditionalBlock() and what in the hell "a never-created block" means, if it is in fsimage.

          Relax

          We had reported the problem that we had found long back during HA testing. Few week back we had discussed about the solution (thanks to Sanjay Radia for suggesting this) of making getAdditionalBlock() truly idempotent and were thinking of fixing it soon.

          Show
          Suresh Srinivas added a comment - If you knew the problem since November why didn't you report it correctly, it would have saved me and others a lot of time... How should I understand you were talking about getAdditionalBlock() and what in the hell "a never-created block" means, if it is in fsimage. Relax We had reported the problem that we had found long back during HA testing. Few week back we had discussed about the solution (thanks to Sanjay Radia for suggesting this) of making getAdditionalBlock() truly idempotent and were thinking of fixing it soon.
          Hide
          Konstantin Shvachko added a comment -

          > Actually it is a problem related to HDFS-4452.

          Which problem? The original description of NN crashing before replying to the client is not a problem and not related to HDFS-4452.
          The problem in comment 2 here is exactly the problem I am talking, but posted after HDFS-4452 was created.
          If you knew the problem since November why didn't you report it correctly, it would have saved me and others a lot of time...
          How should I understand you were talking about getAdditionalBlock() and what in the hell "a never-created block" means, if it is in fsimage.

          Show
          Konstantin Shvachko added a comment - > Actually it is a problem related to HDFS-4452 . Which problem? The original description of NN crashing before replying to the client is not a problem and not related to HDFS-4452 . The problem in comment 2 here is exactly the problem I am talking, but posted after HDFS-4452 was created. If you knew the problem since November why didn't you report it correctly, it would have saved me and others a lot of time... How should I understand you were talking about getAdditionalBlock() and what in the hell "a never-created block" means, if it is in fsimage.
          Hide
          Suresh Srinivas added a comment - - edited

          Brandon. The problem mentioned in your original description seems not to be a problem at all. Because client never knows whether block was created or not until it gets a reply from NN. If NN crashes before replying the block will be correctly reported as missing on restart if it was created. This is the nature of distributed computing.

          Actually it is a problem related to HDFS-4452. When a client does not get response for getAdditionalBlock(), it retries. As getAdditionalBlock() stands currently, since it is really not idempotent, new blocks can be allocated. This causes the issue of namenode reporting corruption for open files. I think changing getAdditionalBlock and adding an offset as suggested by Brandon will make it idempotent. On retry, for the same offset, from the same client, namenode can return the block that has already been allocated, instead of creating new ones.

          Show
          Suresh Srinivas added a comment - - edited Brandon. The problem mentioned in your original description seems not to be a problem at all. Because client never knows whether block was created or not until it gets a reply from NN. If NN crashes before replying the block will be correctly reported as missing on restart if it was created. This is the nature of distributed computing. Actually it is a problem related to HDFS-4452 . When a client does not get response for getAdditionalBlock(), it retries. As getAdditionalBlock() stands currently, since it is really not idempotent, new blocks can be allocated. This causes the issue of namenode reporting corruption for open files. I think changing getAdditionalBlock and adding an offset as suggested by Brandon will make it idempotent. On retry, for the same offset, from the same client, namenode can return the block that has already been allocated, instead of creating new ones.
          Hide
          Konstantin Shvachko added a comment -

          Brandon. The problem mentioned in your original description seems not to be a problem at all. Because client never knows whether block was created or not until it gets a reply from NN. If NN crashes before replying the block will be correctly reported as missing on restart if it was created. This is the nature of distributed computing.
          I'd advocate closing this jira as not a problem.
          The problem you describe in your last comment is a [very] different problem related to race condition in getAdditionalBlock(). Tried to explain it in details in HDFS-4452 just half an hour before your comment. If you are working on it please do. Let's just not mix things and be clear what problem is being solved.

          Show
          Konstantin Shvachko added a comment - Brandon. The problem mentioned in your original description seems not to be a problem at all. Because client never knows whether block was created or not until it gets a reply from NN. If NN crashes before replying the block will be correctly reported as missing on restart if it was created. This is the nature of distributed computing. I'd advocate closing this jira as not a problem. The problem you describe in your last comment is a [very] different problem related to race condition in getAdditionalBlock(). Tried to explain it in details in HDFS-4452 just half an hour before your comment. If you are working on it please do. Let's just not mix things and be clear what problem is being solved.
          Hide
          Brandon Li added a comment -

          Sorry, Yanbo. I though I replied your comment. This is a problem identified in branch-1 in a few deployed environments. I will try your tests with trunk and get back to you soon.

          Part of the problem here is that getAdditionalBlock()(and thus addBlock()) is not real idempotent. When the client or namenode or the network between them causes error, it can leave an assigned blockID but not block created on datanode.

          If addBlock() is really idempotent, the namenode can identified and delete the dangling blockID when it gets the repeated addBlock() request. To make this api idempotent is to add the offset as input parameter, so namenode can check the offset to validate if it's a repeated request. I will upload a patch for that.

          Show
          Brandon Li added a comment - Sorry, Yanbo. I though I replied your comment. This is a problem identified in branch-1 in a few deployed environments. I will try your tests with trunk and get back to you soon. Part of the problem here is that getAdditionalBlock()(and thus addBlock()) is not real idempotent. When the client or namenode or the network between them causes error, it can leave an assigned blockID but not block created on datanode. If addBlock() is really idempotent, the namenode can identified and delete the dangling blockID when it gets the repeated addBlock() request. To make this api idempotent is to add the offset as input parameter, so namenode can check the offset to validate if it's a repeated request. I will upload a patch for that.
          Hide
          Yanbo Liang added a comment -

          I write a test case and run it against with the trunk code. I did not found the status you mentioned above. Is this error can only happen in hadoop 1.* rather than 2.* or higher one? or my test case is not correct to describe your condition?

          Show
          Yanbo Liang added a comment - I write a test case and run it against with the trunk code. I did not found the status you mentioned above. Is this error can only happen in hadoop 1.* rather than 2.* or higher one? or my test case is not correct to describe your condition?

            People

            • Assignee:
              Brandon Li
              Reporter:
              Brandon Li
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development