Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-2815

Namenode is not coming out of safemode when we perform ( NN crash + restart ) . Also FSCK report shows blocks missed.

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 0.22.0, 0.24.0, 0.23.1, 1.0.0, 1.1.0
    • Fix Version/s: 1.1.1, 2.0.0-alpha, 3.0.0
    • Component/s: namenode
    • Labels:
      None

      Description

      When tested the HA(internal) with continuous switch with some 5mins gap, found some blocks missed and namenode went into safemode after next switch.

      After the analysis, i found that this files already deleted by clients. But i don't see any delete commands logs namenode log files. But namenode added that blocks to invalidateSets and DNs deleted the blocks.
      When restart of the namenode, it went into safemode and expecting some more blocks to come out of safemode.

      Here the reason could be that, file has been deleted in memory and added into invalidates after this it is trying to sync the edits into editlog file. By that time NN asked DNs to delete that blocks. Now namenode shuts down before persisting to editlogs.( log behind)
      Due to this reason, we may not get the INFO logs about delete, and when we restart the Namenode (in my scenario it is again switch), Namenode expects this deleted blocks also, as delete request is not persisted into editlog before.

      I reproduced this scenario with bedug points. I feel, We should not add the blocks to invalidates before persisting into Editlog.

      Note: for switch, we used kill -9 (force kill)

      I am currently in 0.20.2 version. Same verified in 0.23 as well in normal crash + restart scenario.

      1. HDFS-2815-Branch-1.patch
        8 kB
        Uma Maheswara Rao G
      2. HDFS-2815-branch-1.patch
        1 kB
        Uma Maheswara Rao G
      3. HDFS-2815-22-branch.patch
        1 kB
        Uma Maheswara Rao G
      4. HDFS-2815.patch
        1 kB
        Uma Maheswara Rao G
      5. HDFS-2815.patch
        1 kB
        Uma Maheswara Rao G

        Issue Links

          Activity

          Hide
          Matt Foley added a comment -

          Closed upon release of 1.1.1.

          Show
          Matt Foley added a comment - Closed upon release of 1.1.1.
          Hide
          Matt Foley added a comment -

          included in branch-1.1

          Show
          Matt Foley added a comment - included in branch-1.1
          Hide
          Suresh Srinivas added a comment -

          I have updated the fix version from 1.1.1 to 1.2.0. In case you disagree, you can change it back.

          Show
          Suresh Srinivas added a comment - I have updated the fix version from 1.1.1 to 1.2.0. In case you disagree, you can change it back.
          Hide
          Suresh Srinivas added a comment -

          Uma, you have marked the Fix version as 1.1.1. But I do not see this change in branch-1.1. Should 1.1.1 be removed from Fix Version?

          Show
          Suresh Srinivas added a comment - Uma, you have marked the Fix version as 1.1.1. But I do not see this change in branch-1.1. Should 1.1.1 be removed from Fix Version?
          Hide
          Uma Maheswara Rao G added a comment -

          Thanks a lot, Suresh for review.
          I have just committed this to branch-1 ( as revision 1378664.)

          Show
          Uma Maheswara Rao G added a comment - Thanks a lot, Suresh for review. I have just committed this to branch-1 ( as revision 1378664.)
          Hide
          Suresh Srinivas added a comment -

          +1 for the branch-1 patch.

          Show
          Suresh Srinivas added a comment - +1 for the branch-1 patch.
          Hide
          Uma Maheswara Rao G added a comment -

          @Suresh, could you please take a look on branch-1 patch?
          If you +1 on it, I will commit and resolve the issue.

          Show
          Uma Maheswara Rao G added a comment - @Suresh, could you please take a look on branch-1 patch? If you +1 on it, I will commit and resolve the issue.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12542794/HDFS-2815-branch-1.patch
          against trunk revision .

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3111//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12542794/HDFS-2815-branch-1.patch against trunk revision . -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3111//console This message is automatically generated.
          Hide
          Uma Maheswara Rao G added a comment -

          Suresh, added the back-port patch in HDFS-3791 which is generated based on HDFS-173.

          Show
          Uma Maheswara Rao G added a comment - Suresh, added the back-port patch in HDFS-3791 which is generated based on HDFS-173 .
          Hide
          Uma Maheswara Rao G added a comment -

          Thanks a lot Suresh, for taking a look on this. Yes, from last few days I was busy with some tasks. But this is in my list, targeted on this weekend. First we have to generate back-port patch on HDFS-173. For any reason, If I could not do it on this weekend, I will take your help on this JIRA for further movement.
          Thanks a lot, for the help Suresh .

          Show
          Uma Maheswara Rao G added a comment - Thanks a lot Suresh, for taking a look on this. Yes, from last few days I was busy with some tasks. But this is in my list, targeted on this weekend. First we have to generate back-port patch on HDFS-173 . For any reason, If I could not do it on this weekend, I will take your help on this JIRA for further movement. Thanks a lot, for the help Suresh .
          Hide
          Suresh Srinivas added a comment -

          Uma, if you are busy, I can do this for branch-1.

          Show
          Suresh Srinivas added a comment - Uma, if you are busy, I can do this for branch-1.
          Hide
          Suresh Srinivas added a comment -

          Uma, the second option makes sense. Let's back port HDFS-173 and then this patch. Thanks for doing it.

          Show
          Suresh Srinivas added a comment - Uma, the second option makes sense. Let's back port HDFS-173 and then this patch. Thanks for doing it.
          Hide
          Uma Maheswara Rao G added a comment -

          Yeah, Suresh.
          That would be good if I include HDFS-173 as well. Because almost I have taken the parts from HDFS-173 and added current fix.

          But per the comment, we decided not including HDFS-173 completely in this patch.

          But after the above comment fixes, I think there won't be much difference with HDFS-173. So, I will consider HDFS-173 also now with this patch and will include that tests as well.

          Other option is, I will provide a back port patch in HDFS-173 itself. After that issue committed, we can add straightforward patch for this JIRA? This may be more clear right? (instead of mixing other JIRA changes in this). How does this sounds to you?

          Thanks,
          Uma

          Show
          Uma Maheswara Rao G added a comment - Yeah, Suresh. That would be good if I include HDFS-173 as well. Because almost I have taken the parts from HDFS-173 and added current fix. But per the comment , we decided not including HDFS-173 completely in this patch. But after the above comment fixes, I think there won't be much difference with HDFS-173 . So, I will consider HDFS-173 also now with this patch and will include that tests as well. Other option is, I will provide a back port patch in HDFS-173 itself. After that issue committed, we can add straightforward patch for this JIRA? This may be more clear right? (instead of mixing other JIRA changes in this). How does this sounds to you? Thanks, Uma
          Hide
          Suresh Srinivas added a comment -

          Additionally the change in INodeFile.java to set the InodeFile's blocks as null may be necessary in this patch as well.

          Show
          Suresh Srinivas added a comment - Additionally the change in INodeFile.java to set the InodeFile's blocks as null may be necessary in this patch as well.
          Hide
          Suresh Srinivas added a comment -

          I think I have to remove the inner synchronization block

          I think you should remove outer method synchronization and retain inner synchronization. That way you do not sync editlog holding the lock.

          Is this patch quite a bit different from HDFS-173 with the above change? If so, should we just mark both HDFS-173 and HDFS-2815 as done for branch-1 also. The test from HDFS-173 can be included in this patch then?

          Show
          Suresh Srinivas added a comment - I think I have to remove the inner synchronization block I think you should remove outer method synchronization and retain inner synchronization. That way you do not sync editlog holding the lock. Is this patch quite a bit different from HDFS-173 with the above change? If so, should we just mark both HDFS-173 and HDFS-2815 as done for branch-1 also. The test from HDFS-173 can be included in this patch then?
          Hide
          Suresh Srinivas added a comment -

          @manish - you meant to post the above comment in some other jira?

          Show
          Suresh Srinivas added a comment - @manish - you meant to post the above comment in some other jira?
          Hide
          manish v dunani added a comment -

          Hav u try to to do safemode off wid the command:bin/hadoop dfsadmin -safemode leave.I THINK IT'S compatible.

          Show
          manish v dunani added a comment - Hav u try to to do safemode off wid the command:bin/hadoop dfsadmin -safemode leave.I THINK IT'S compatible.
          Hide
          Uma Maheswara Rao G added a comment -

          Thanks a lot Suresh for the review!

          I think I have to remove the inner synchronization block as I am not protecting the removeBlocks in the synchronized block separately. Also this JIRA is not targetting to fix the synchronization issue in "Large directory deletion issue" right.

          For the test, I could't get any clear assertion for the behaviour of this issue .
          Do you have any suggestion?

          Show
          Uma Maheswara Rao G added a comment - Thanks a lot Suresh for the review! I think I have to remove the inner synchronization block as I am not protecting the removeBlocks in the synchronized block separately. Also this JIRA is not targetting to fix the synchronization issue in "Large directory deletion issue" right. For the test, I could't get any clear assertion for the behaviour of this issue . Do you have any suggestion?
          Hide
          Suresh Srinivas added a comment -

          Uma, thanks for posting the branch-1 patch. FSNamesystem#deleteInternal() method should no longer be synchronized right, given the internal synchronized block?

          Also should we add a unit test to this patch?

          Show
          Suresh Srinivas added a comment - Uma, thanks for posting the branch-1 patch. FSNamesystem#deleteInternal() method should no longer be synchronized right, given the internal synchronized block? Also should we add a unit test to this patch?
          Hide
          Suresh Srinivas added a comment -

          Suresh, could you please take a look on branch-1 patch? so, that we can resolve this issue.

          Sorry I had not seen your comment Uma. I will take a look at the patch.

          Show
          Suresh Srinivas added a comment - Suresh, could you please take a look on branch-1 patch? so, that we can resolve this issue. Sorry I had not seen your comment Uma. I will take a look at the patch.
          Hide
          Matt Foley added a comment -

          Updated Fix Versions to match @Robert's changes to Target Versions.
          Changed Target Version 1.1.0 to 1.2.0, since the branch-1 patch was not reviewed and committed in time for 1.1.0.
          Please do proceed with the port to branch-1. Thanks.

          Show
          Matt Foley added a comment - Updated Fix Versions to match @Robert's changes to Target Versions. Changed Target Version 1.1.0 to 1.2.0, since the branch-1 patch was not reviewed and committed in time for 1.1.0. Please do proceed with the port to branch-1. Thanks.
          Hide
          Uma Maheswara Rao G added a comment -

          Suresh, could you please take a look on branch-1 patch? so, that we can resolve this issue.

          Show
          Uma Maheswara Rao G added a comment - Suresh, could you please take a look on branch-1 patch? so, that we can resolve this issue.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12517041/HDFS-2815-22-branch.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1948//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12517041/HDFS-2815-22-branch.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1948//console This message is automatically generated.
          Hide
          Uma Maheswara Rao G added a comment -

          As this is an issue in 22 branch also, I just updated the patch based on 22 branch as well.

          Show
          Uma Maheswara Rao G added a comment - As this is an issue in 22 branch also, I just updated the patch based on 22 branch as well.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12517008/HDFS-2815-Branch-1.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1947//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12517008/HDFS-2815-Branch-1.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1947//console This message is automatically generated.
          Hide
          Uma Maheswara Rao G added a comment -

          Attached a patch for branch-1.
          This is mostly like a re-factoring from HDFS-173 and deleted the blocks after logsync.

          Show
          Uma Maheswara Rao G added a comment - Attached a patch for branch-1. This is mostly like a re-factoring from HDFS-173 and deleted the blocks after logsync.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk #990 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/990/)
          HDFS-2815. Namenode sometimes oes not come out of safemode during NN crash + restart. Contributed by Uma Maheswara Rao. (Revision 1243673)

          Result = SUCCESS
          suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1243673
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #990 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/990/ ) HDFS-2815 . Namenode sometimes oes not come out of safemode during NN crash + restart. Contributed by Uma Maheswara Rao. (Revision 1243673) Result = SUCCESS suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1243673 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-0.23-Build #168 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/168/)
          HDFS-2815. Merging change r1243673 from trunk to 0.23. (Revision 1243674)

          Result = FAILURE
          suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1243674
          Files :

          • /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-0.23-Build #168 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/168/ ) HDFS-2815 . Merging change r1243673 from trunk to 0.23. (Revision 1243674) Result = FAILURE suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1243674 Files : /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk #955 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/955/)
          HDFS-2815. Namenode sometimes oes not come out of safemode during NN crash + restart. Contributed by Uma Maheswara Rao. (Revision 1243673)

          Result = FAILURE
          suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1243673
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #955 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/955/ ) HDFS-2815 . Namenode sometimes oes not come out of safemode during NN crash + restart. Contributed by Uma Maheswara Rao. (Revision 1243673) Result = FAILURE suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1243673 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
          Hide
          Uma Maheswara Rao G added a comment -

          Ok, thanks for the clarification. I will do that.

          Show
          Uma Maheswara Rao G added a comment - Ok, thanks for the clarification. I will do that.
          Hide
          Suresh Srinivas added a comment -

          Changes only required for this issue, though it may require quite a bit of hdfs-173.

          Show
          Suresh Srinivas added a comment - Changes only required for this issue, though it may require quite a bit of hdfs-173.
          Hide
          Uma Maheswara Rao G added a comment -

          Thanks a lot Suresh, for the reviews.

          @Uma do you want to take a stab at it?

          Yes, I will do that.
          Before that, Are you suggesting to do the required changes only for this issue?
          (or) we will first back port HDFS-173(as most of the code from HDFS-173 required for HDFS-2815) and reflecting to that, update patch for branch-1?.

          Show
          Uma Maheswara Rao G added a comment - Thanks a lot Suresh, for the reviews. @Uma do you want to take a stab at it? Yes, I will do that. Before that, Are you suggesting to do the required changes only for this issue? (or) we will first back port HDFS-173 (as most of the code from HDFS-173 required for HDFS-2815 ) and reflecting to that, update patch for branch-1?.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-0.23-Build #195 (See https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Build/195/)
          HDFS-2815. Merging change r1243673 from trunk to 0.23. (Revision 1243674)

          Result = FAILURE
          suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1243674
          Files :

          • /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-0.23-Build #195 (See https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Build/195/ ) HDFS-2815 . Merging change r1243673 from trunk to 0.23. (Revision 1243674) Result = FAILURE suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1243674 Files : /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-0.23-Commit #549 (See https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Commit/549/)
          HDFS-2815. Merging change r1243673 from trunk to 0.23. (Revision 1243674)

          Result = ABORTED
          suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1243674
          Files :

          • /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-0.23-Commit #549 (See https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Commit/549/ ) HDFS-2815 . Merging change r1243673 from trunk to 0.23. (Revision 1243674) Result = ABORTED suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1243674 Files : /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk-Commit #1732 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/1732/)
          HDFS-2815. Namenode sometimes oes not come out of safemode during NN crash + restart. Contributed by Uma Maheswara Rao. (Revision 1243673)

          Result = ABORTED
          suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1243673
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #1732 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/1732/ ) HDFS-2815 . Namenode sometimes oes not come out of safemode during NN crash + restart. Contributed by Uma Maheswara Rao. (Revision 1243673) Result = ABORTED suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1243673 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-0.23-Commit #533 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Commit/533/)
          HDFS-2815. Merging change r1243673 from trunk to 0.23. (Revision 1243674)

          Result = SUCCESS
          suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1243674
          Files :

          • /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-0.23-Commit #533 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Commit/533/ ) HDFS-2815 . Merging change r1243673 from trunk to 0.23. (Revision 1243674) Result = SUCCESS suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1243674 Files : /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Common-trunk-Commit #1721 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/1721/)
          HDFS-2815. Namenode sometimes oes not come out of safemode during NN crash + restart. Contributed by Uma Maheswara Rao. (Revision 1243673)

          Result = SUCCESS
          suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1243673
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
          Show
          Hudson added a comment - Integrated in Hadoop-Common-trunk-Commit #1721 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/1721/ ) HDFS-2815 . Namenode sometimes oes not come out of safemode during NN crash + restart. Contributed by Uma Maheswara Rao. (Revision 1243673) Result = SUCCESS suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1243673 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Common-0.23-Commit #545 (See https://builds.apache.org/job/Hadoop-Common-0.23-Commit/545/)
          HDFS-2815. Merging change r1243673 from trunk to 0.23. (Revision 1243674)

          Result = SUCCESS
          suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1243674
          Files :

          • /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
          Show
          Hudson added a comment - Integrated in Hadoop-Common-0.23-Commit #545 (See https://builds.apache.org/job/Hadoop-Common-0.23-Commit/545/ ) HDFS-2815 . Merging change r1243673 from trunk to 0.23. (Revision 1243674) Result = SUCCESS suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1243674 Files : /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk-Commit #1795 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/1795/)
          HDFS-2815. Namenode sometimes oes not come out of safemode during NN crash + restart. Contributed by Uma Maheswara Rao. (Revision 1243673)

          Result = SUCCESS
          suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1243673
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #1795 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/1795/ ) HDFS-2815 . Namenode sometimes oes not come out of safemode during NN crash + restart. Contributed by Uma Maheswara Rao. (Revision 1243673) Result = SUCCESS suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1243673 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
          Hide
          Suresh Srinivas added a comment -

          I committed the patch to 0.24 and 0.23. Thank you Uma.

          We should fix this for 1.1.0 release. However that is non-trivial since it requires parts of the functionality from HDFS-173.
          @Uma do you want to take a stab at it?

          Show
          Suresh Srinivas added a comment - I committed the patch to 0.24 and 0.23. Thank you Uma. We should fix this for 1.1.0 release. However that is non-trivial since it requires parts of the functionality from HDFS-173 . @Uma do you want to take a stab at it?
          Hide
          Uma Maheswara Rao G added a comment -

          Hi Suresh, Could you please take a look for committing? -Thanks

          Show
          Uma Maheswara Rao G added a comment - Hi Suresh, Could you please take a look for committing? -Thanks
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12514253/HDFS-2815.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in .

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/1865//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1865//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12514253/HDFS-2815.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/1865//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1865//console This message is automatically generated.
          Hide
          Todd Lipcon added a comment -

          Oops, sorry, I phrased that poorly (I agree that HDFS-173 didn't cause the problem, but rather just happened to edit the same area of code). Thanks for the details, Suresh.

          Show
          Todd Lipcon added a comment - Oops, sorry, I phrased that poorly (I agree that HDFS-173 didn't cause the problem, but rather just happened to edit the same area of code). Thanks for the details, Suresh.
          Hide
          Uma Maheswara Rao G added a comment -

          Yes, Suresh.
          I agreed with you. HDFS-173 is not the cause. I already expressed this in my previous comment.
          Thanks a lot for the explanation. Also thanks for the review.
          Updated the patch which removes one unused variable..

          Show
          Uma Maheswara Rao G added a comment - Yes, Suresh. I agreed with you. HDFS-173 is not the cause. I already expressed this in my previous comment. Thanks a lot for the explanation. Also thanks for the review. Updated the patch which removes one unused variable..
          Hide
          Suresh Srinivas added a comment -

          Uma, +1 for the patch. Please do remove deleteNow var, in the next version of the patch.

          Show
          Suresh Srinivas added a comment - Uma, +1 for the patch. Please do remove deleteNow var, in the next version of the patch.
          Hide
          Suresh Srinivas added a comment -

          Linking HDFS-173, the patch that added the problematic code.

          HDFS-173 is not the cause. Before HDFS-173, the following was the sequence:

          1. Delete directory, files and blocks holding the lock. This could trigger the deletion of blocks at the datanodes
          2. Then add editlog entry outside the lock

          As this jira discussion demonstrates, between the above steps, if NN crashes, there is possibility of block deletion on DNs. However no record of deletion exists in editlog.

          With HDFS-173, the behavior changed to:

          1. Delete directory, files and blocks holding the lock. This could trigger the deletion of blocks * if number of blocks is small * at the datanodes
          2. Then add editlog entry outside the lock.
          3. * New change to * delete the blocks if the number of blocks is large.

          Note the part that Uma is talking about is from the step 1. Still the old behavior.

          The patch is now proposing deletion of blocks post recording it in editlog - from step 3 of HDFS-173. I think this sounds fine.

          Show
          Suresh Srinivas added a comment - Linking HDFS-173 , the patch that added the problematic code. HDFS-173 is not the cause. Before HDFS-173 , the following was the sequence: Delete directory, files and blocks holding the lock. This could trigger the deletion of blocks at the datanodes Then add editlog entry outside the lock As this jira discussion demonstrates, between the above steps, if NN crashes, there is possibility of block deletion on DNs. However no record of deletion exists in editlog. With HDFS-173 , the behavior changed to: Delete directory, files and blocks holding the lock. This could trigger the deletion of blocks * if number of blocks is small * at the datanodes Then add editlog entry outside the lock. * New change to * delete the blocks if the number of blocks is large. Note the part that Uma is talking about is from the step 1. Still the old behavior. The patch is now proposing deletion of blocks post recording it in editlog - from step 3 of HDFS-173 . I think this sounds fine.
          Hide
          Uma Maheswara Rao G added a comment -

          forgot to remove unused variable deleteNow in this patch. will update this in next patch along with your feedback on fix proposal.

          Show
          Uma Maheswara Rao G added a comment - forgot to remove unused variable deleteNow in this patch. will update this in next patch along with your feedback on fix proposal.
          Hide
          Uma Maheswara Rao G added a comment -

          Ok, thanks todd.
          It seems to me that, In older versions,we always deletes before syncing. As part of the HDFS-173, we are deleting the blocks before syncing if they are less than or equal to the BLOCK_DELETION_INCREMENT. Default 1000.
          I guess HDFS-173 is to solve the large block deletes. SO, might not intended to change the behaviour with smaller deletes.
          Yes, Suresh can confirm the exact reason for this.
          Also offline, Suresh confirmed that he will take a look soon.

          Show
          Uma Maheswara Rao G added a comment - Ok, thanks todd. It seems to me that, In older versions,we always deletes before syncing. As part of the HDFS-173 , we are deleting the blocks before syncing if they are less than or equal to the BLOCK_DELETION_INCREMENT. Default 1000. I guess HDFS-173 is to solve the large block deletes. SO, might not intended to change the behaviour with smaller deletes. Yes, Suresh can confirm the exact reason for this. Also offline, Suresh confirmed that he will take a look soon.
          Hide
          Todd Lipcon added a comment -

          Linking HDFS-173, the patch that added the problematic code. I will ping Suresh to take a look at this as well – it seems like there was an explicit choice to issue the small deletes before syncing to the edit log, but not sure why.

          Show
          Todd Lipcon added a comment - Linking HDFS-173 , the patch that added the problematic code. I will ping Suresh to take a look at this as well – it seems like there was an explicit choice to issue the small deletes before syncing to the edit log, but not sure why.
          Hide
          Uma Maheswara Rao G added a comment -

          Could some one please review this?

          Show
          Uma Maheswara Rao G added a comment - Could some one please review this?
          Hide
          Uma Maheswara Rao G added a comment -

          -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings.

          not related to this patch . It is handling by HDFS-2835
          Test failures also unrelated to this patch.

          Show
          Uma Maheswara Rao G added a comment - -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. not related to this patch . It is handling by HDFS-2835 Test failures also unrelated to this patch.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12511806/HDFS-2815.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hdfs.TestFSInputChecker

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/1827//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/1827//artifact/trunk/hadoop-hdfs-project/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1827//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12511806/HDFS-2815.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 eclipse:eclipse. The patch built with eclipse:eclipse. -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hdfs.TestFSInputChecker +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/1827//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/1827//artifact/trunk/hadoop-hdfs-project/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1827//console This message is automatically generated.
          Hide
          Uma Maheswara Rao G added a comment -

          To work start, updated the patch which i proposed in previous comment.

          Show
          Uma Maheswara Rao G added a comment - To work start, updated the patch which i proposed in previous comment.
          Hide
          Uma Maheswara Rao G added a comment -

          I discussed this issue with Todd offline. Thanks a lot Todd for the confirmation.

          In trunk currenly we will remove the blocks in-memory before the logSync if blocks are less than or equal to BLOCK_DELETION_INCREMENT.

          deleteNow = collectedBlocks.size() <= BLOCK_DELETION_INCREMENT;
                if (deleteNow) { // Perform small deletes right away
                  removeBlocks(collectedBlocks);
                }
          

          if blocks are more than BLOCK_DELETION_INCREMENT, then we are deleting after logSync.

          I think, we can delete the blocks always after logSync only and just remove this BLOCK_DELETION_INCREMENT check here.

          @Todd, do you see any other impacts with this proposal here?

          Show
          Uma Maheswara Rao G added a comment - I discussed this issue with Todd offline. Thanks a lot Todd for the confirmation. In trunk currenly we will remove the blocks in-memory before the logSync if blocks are less than or equal to BLOCK_DELETION_INCREMENT. deleteNow = collectedBlocks.size() <= BLOCK_DELETION_INCREMENT; if (deleteNow) { // Perform small deletes right away removeBlocks(collectedBlocks); } if blocks are more than BLOCK_DELETION_INCREMENT, then we are deleting after logSync. I think, we can delete the blocks always after logSync only and just remove this BLOCK_DELETION_INCREMENT check here. @Todd, do you see any other impacts with this proposal here?

            People

            • Assignee:
              Uma Maheswara Rao G
              Reporter:
              Uma Maheswara Rao G
            • Votes:
              0 Vote for this issue
              Watchers:
              15 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development