Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-6696

Name node cannot start if the path of a file under construction contains ".snapshot"

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.5.0
    • Component/s: None
    • Labels:
      None
    • Target Version/s:

      Description

      Using -renameReserved to rename ".snapshot" in a pre-hdfs-snapshot feature fsimage during upgrade only works, if there is nothing under construction under the renamed directory. I am not sure whether it takes care of edits containing ".snapshot" properly.

      The workaround is to identify these directories and rename, then do saveNamespace before performing upgrade.

      1. hdfs-6696.001.patch
        12 kB
        Andrew Wang
      2. hdfs-6696.002.patch
        12 kB
        Andrew Wang
      3. hdfs-6696.003.patch
        8 kB
        Andrew Wang

        Activity

        Hide
        Andrew Wang added a comment -

        Thanks for finding this Kihwal, lemme take a look.

        Show
        Andrew Wang added a comment - Thanks for finding this Kihwal, lemme take a look.
        Hide
        Kihwal Lee added a comment - - edited

        Here is the stack trace.

        java.io.FileNotFoundException: File does not exist: /xxx/yyy/.snapshot/zzz/aaa
        at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:65)
        at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:55)
        at org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Loader.loadFilesUnderConstruction(FSImageFormat.java:937)
        at org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Loader.load(FSImageFormat.java:424)
        at org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:230)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:902)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:888)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:711)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:649)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.doUpgrade(FSImage.java:359)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:259)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:894)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:641)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:526)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:582)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:747)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:731)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1381)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1447)

        Show
        Kihwal Lee added a comment - - edited Here is the stack trace. java.io.FileNotFoundException: File does not exist: /xxx/yyy/.snapshot/zzz/aaa at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:65) at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:55) at org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Loader.loadFilesUnderConstruction(FSImageFormat.java:937) at org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Loader.load(FSImageFormat.java:424) at org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:230) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:902) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:888) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:711) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:649) at org.apache.hadoop.hdfs.server.namenode.FSImage.doUpgrade(FSImage.java:359) at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:259) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:894) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:641) at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:526) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:582) at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:747) at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:731) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1381) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1447)
        Hide
        Andrew Wang added a comment -

        Poking around on JIRA, snapshots were added in 2.1.0, and -renameReserved was added in 2.4.0. Thus we had a few releases where -renameReserved wasn't available at all. I don't think this current behavior is a regression from 2.4 either, since -renameReserved hasn't changed much.

        All that considered, should this be marked as a blocker? I'll still work on it ASAP, but I know Karthik Kambatla is itching to roll a 2.5.

        Show
        Andrew Wang added a comment - Poking around on JIRA, snapshots were added in 2.1.0, and -renameReserved was added in 2.4.0. Thus we had a few releases where -renameReserved wasn't available at all. I don't think this current behavior is a regression from 2.4 either, since -renameReserved hasn't changed much. All that considered, should this be marked as a blocker? I'll still work on it ASAP, but I know Karthik Kambatla is itching to roll a 2.5.
        Hide
        Andrew Wang added a comment -

        Hey Kihwal Lee, what Hadoop version were you starting with and upgrading to? I did some tests upgrading from a 1.2.1 image to trunk which I think repros this, but I'd like to verify that my WIP fix works for your situation.

        Show
        Andrew Wang added a comment - Hey Kihwal Lee , what Hadoop version were you starting with and upgrading to? I did some tests upgrading from a 1.2.1 image to trunk which I think repros this, but I'd like to verify that my WIP fix works for your situation.
        Hide
        Mit Desai added a comment -

        Andrew Wang, we were trying to upgrade 0.21.11 to 2.4.0

        Show
        Mit Desai added a comment - Andrew Wang , we were trying to upgrade 0.21.11 to 2.4.0
        Hide
        Andrew Wang added a comment -

        Thanks Mit, I assume you meant 0.23.11.

        Patch attached. I found that branch-1 calls a different method to add paths, and the rename function also needs to be called for UC files too. Added some tests with some image+edits from 1.2.1 and 0.23.11. Patch was generated with "git diff --binary".

        Show
        Andrew Wang added a comment - Thanks Mit, I assume you meant 0.23.11. Patch attached. I found that branch-1 calls a different method to add paths, and the rename function also needs to be called for UC files too. Added some tests with some image+edits from 1.2.1 and 0.23.11. Patch was generated with "git diff --binary".
        Hide
        Andrew Wang added a comment -

        Forgot to make this patch available...review would still be appreciated, since this is a blocker.

        Show
        Andrew Wang added a comment - Forgot to make this patch available...review would still be appreciated, since this is a blocker.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12657038/hdfs-6696.001.patch
        against trunk revision .

        -1 patch. The patch command could not apply the patch.

        Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7447//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12657038/hdfs-6696.001.patch against trunk revision . -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7447//console This message is automatically generated.
        Hide
        Andrew Wang added a comment -

        Rebased, still `git diff --binary`

        Show
        Andrew Wang added a comment - Rebased, still `git diff --binary`
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12657475/hdfs-6696.002.patch
        against trunk revision .

        -1 patch. The patch command could not apply the patch.

        Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7449//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12657475/hdfs-6696.002.patch against trunk revision . -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7449//console This message is automatically generated.
        Hide
        Andrew Wang added a comment -

        Same patch without the binary diff. All of these apply fine for me with patch -p0 -E < thePatch, so not sure what's going on.

        Show
        Andrew Wang added a comment - Same patch without the binary diff. All of these apply fine for me with patch -p0 -E < thePatch , so not sure what's going on.
        Hide
        Jing Zhao added a comment -

        The patch looks good to me. The new unit tests also pass in my local machine after applying the binary changes. +1 pending Jenkins.

        Show
        Jing Zhao added a comment - The patch looks good to me. The new unit tests also pass in my local machine after applying the binary changes. +1 pending Jenkins.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12657664/hdfs-6696.003.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 1 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. There were no new javadoc warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs:

        org.apache.hadoop.hdfs.TestDFSUpgradeFromImage
        org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover
        org.apache.hadoop.hdfs.server.namenode.TestNamenodeCapacityReport

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7456//testReport/
        Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7456//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12657664/hdfs-6696.003.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestDFSUpgradeFromImage org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover org.apache.hadoop.hdfs.server.namenode.TestNamenodeCapacityReport +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7456//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7456//console This message is automatically generated.
        Hide
        Andrew Wang added a comment -

        I ran the failed tests successfully locally, so I think we're good. Thanks for reviewing Jing, I'll commit this shortly to all the branches.

        Show
        Andrew Wang added a comment - I ran the failed tests successfully locally, so I think we're good. Thanks for reviewing Jing, I'll commit this shortly to all the branches.
        Hide
        Andrew Wang added a comment -

        Committed to trunk, branch-2, branch-2.5. Thanks again Kihwal for reporting, Jing for reviewing.

        I did mess up the trunk commit message a bit, forgot to put the JIRA #.

        Show
        Andrew Wang added a comment - Committed to trunk, branch-2, branch-2.5. Thanks again Kihwal for reporting, Jing for reviewing. I did mess up the trunk commit message a bit, forgot to put the JIRA #.

          People

          • Assignee:
            Andrew Wang
            Reporter:
            Kihwal Lee
          • Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development