Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-1612

HDFS Design Documentation is outdated

    Details

    • Hadoop Flags:
      Reviewed

      Description

      I was trying to discover details about the Secondary NameNode, and came across the description below in the HDFS design doc.

      The NameNode keeps an image of the entire file system namespace and file Blockmap in memory. This key metadata item is designed to be compact, such that a NameNode with 4 GB of RAM is plenty to support a huge number of files and directories. When the NameNode starts up, it reads the FsImage and EditLog from disk, applies all the transactions from the EditLog to the in-memory representation of the FsImage, and flushes out this new version into a new FsImage on disk. It can then truncate the old EditLog because its transactions have been applied to the persistent FsImage. This process is called a checkpoint. In the current implementation, a checkpoint only occurs when the NameNode starts up. Work is in progress to support periodic checkpointing in the near future.

      (emphasis mine).

      Note that this directly conflicts with information in the hdfs user guide, http://hadoop.apache.org/common/docs/r0.20.2/hdfs_user_guide.html#Secondary+NameNode
      and http://hadoop.apache.org/hdfs/docs/current/hdfs_user_guide.html#Checkpoint+Node

      I haven't done a thorough audit of that doc-- I only noticed the above inaccuracy.

        Issue Links

          Activity

          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-22-branch #35 (See https://builds.apache.org/hudson/job/Hadoop-Hdfs-22-branch/35/)

          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-22-branch #35 (See https://builds.apache.org/hudson/job/Hadoop-Hdfs-22-branch/35/ )
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk #643 (See https://builds.apache.org/hudson/job/Hadoop-Hdfs-trunk/643/)

          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #643 (See https://builds.apache.org/hudson/job/Hadoop-Hdfs-trunk/643/ )
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk-Commit #547 (See https://hudson.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/547/)

          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #547 (See https://hudson.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/547/ )
          Hide
          Joe Crobak added a comment -

          The test-patch script actually is able to detect documentation changes. So it won't get a -1 from not adding tests.

          That's strange, when I ran test-patch locally, (from the hdfs project), I got a -1 for failing to add new tests.

          (It would be great if you could also fix HDFS-1388.)

          I can take a look.

          Show
          Joe Crobak added a comment - The test-patch script actually is able to detect documentation changes. So it won't get a -1 from not adding tests. That's strange, when I ran test-patch locally, (from the hdfs project), I got a -1 for failing to add new tests. (It would be great if you could also fix HDFS-1388 .) I can take a look.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          I have committed this. Thanks, Joe!

          (It would be great if you could also fix HDFS-1388.)

          Show
          Tsz Wo Nicholas Sze added a comment - I have committed this. Thanks, Joe! (It would be great if you could also fix HDFS-1388 .)
          Hide
          Tsz Wo Nicholas Sze added a comment -

          > It would be great if you can update the doc. See also HDFS-1612.
          I mean to say HDFS-1388 but not HDFS-1612.

          Show
          Tsz Wo Nicholas Sze added a comment - > It would be great if you can update the doc. See also HDFS-1612 . I mean to say HDFS-1388 but not HDFS-1612 .
          Hide
          Tsz Wo Nicholas Sze added a comment -

          > Note that this patch will get a -1 from hudson because it doesn't include any new tests. It's just a document update, though, so no new tests are expected.

          The test-patch script actually is able to detect documentation changes. So it won't get a -1 from not adding tests.

          Obviously, the -1 core and contrib tests are not related to this.

          +1 patch looks good.

          Show
          Tsz Wo Nicholas Sze added a comment - > Note that this patch will get a -1 from hudson because it doesn't include any new tests. It's just a document update, though, so no new tests are expected. The test-patch script actually is able to detect documentation changes. So it won't get a -1 from not adding tests. Obviously, the -1 core and contrib tests are not related to this. +1 patch looks good.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12471796/HDFS-1612.patch
          against trunk revision 1072023.

          +1 @author. The patch does not contain any @author tags.

          +0 tests included. The patch appears to be a documentation patch that doesn't require tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these core unit tests:
          org.apache.hadoop.hdfs.TestFileConcurrentReader

          -1 contrib tests. The patch failed contrib unit tests.

          +1 system test framework. The patch passed system test framework compile.

          Test results: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/208//testReport/
          Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/208//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/208//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12471796/HDFS-1612.patch against trunk revision 1072023. +1 @author. The patch does not contain any @author tags. +0 tests included. The patch appears to be a documentation patch that doesn't require tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: org.apache.hadoop.hdfs.TestFileConcurrentReader -1 contrib tests. The patch failed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/208//testReport/ Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/208//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/208//console This message is automatically generated.
          Hide
          Joe Crobak added a comment -

          Note that this patch will get a -1 from hudson because it doesn't include any new tests. It's just a document update, though, so no new tests are expected.

          Show
          Joe Crobak added a comment - Note that this patch will get a -1 from hudson because it doesn't include any new tests. It's just a document update, though, so no new tests are expected.
          Hide
          Joe Crobak added a comment -

          First pass at an update to the hdfs_design doc.

          Changes:

          • "scale to hundreds of nodes" -> "scale to thousands of nodes"
          • Changes to reflect append and hflush features.
          • Mention support for user quotas.
          • fixed a typo – stray gg/
          • Mention checkpoint and backup nodes.

          There are a few other things that might be updated:

          • "HDFS does not currently support snapshots but will in a future release" – but HDFS-233 hasn't been updated since June, 2010.
          • "Work is in progress to expose HDFS through the WebDAV protocol" – either reference https://github.com/huyphan/HDFS-over-Webdav or remove this? HDFS-225 hasn't been updated since August 2009.
          • It's unclear to me if the rebalancing section needs to be updated. The hadoop balancer is a manual process, AFAIK, so what is there is technically accurate.
          Show
          Joe Crobak added a comment - First pass at an update to the hdfs_design doc. Changes: "scale to hundreds of nodes" -> "scale to thousands of nodes" Changes to reflect append and hflush features. Mention support for user quotas. fixed a typo – stray gg/ Mention checkpoint and backup nodes. There are a few other things that might be updated: "HDFS does not currently support snapshots but will in a future release" – but HDFS-233 hasn't been updated since June, 2010. "Work is in progress to expose HDFS through the WebDAV protocol" – either reference https://github.com/huyphan/HDFS-over-Webdav or remove this? HDFS-225 hasn't been updated since August 2009. It's unclear to me if the rebalancing section needs to be updated. The hadoop balancer is a manual process, AFAIK, so what is there is technically accurate.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          > I'd be happy to create a patch. I did a quick look over the hdfs design doc, and there seem to be a few new features of 0.21 (e.g. append, symbolic links) as well as some older features (e.g. quoatas) that aren't documented correctly. ...

          It would be great if you can update the doc. See also HDFS-1612.

          Show
          Tsz Wo Nicholas Sze added a comment - > I'd be happy to create a patch. I did a quick look over the hdfs design doc, and there seem to be a few new features of 0.21 (e.g. append, symbolic links) as well as some older features (e.g. quoatas) that aren't documented correctly. ... It would be great if you can update the doc. See also HDFS-1612 .
          Hide
          Mahadev konar added a comment -

          +1 for updating append/sym links as well.

          Show
          Mahadev konar added a comment - +1 for updating append/sym links as well.
          Hide
          Joe Crobak added a comment -

          This portion of the HDFS document is outdated. would you like to submit a patch that brings it upto date?

          I'd be happy to create a patch. I did a quick look over the hdfs design doc, and there seem to be a few new features of 0.21 (e.g. append, symbolic links) as well as some older features (e.g. quoatas) that aren't documented correctly. I'd be happy to update those, too, either as part of this patch or separately.

          Show
          Joe Crobak added a comment - This portion of the HDFS document is outdated. would you like to submit a patch that brings it upto date? I'd be happy to create a patch. I did a quick look over the hdfs design doc, and there seem to be a few new features of 0.21 (e.g. append, symbolic links) as well as some older features (e.g. quoatas) that aren't documented correctly. I'd be happy to update those, too, either as part of this patch or separately.
          Hide
          dhruba borthakur added a comment -

          This portion of the HDFS document is outdated. would you like to submit a patch that brings it upto date?

          Show
          dhruba borthakur added a comment - This portion of the HDFS document is outdated. would you like to submit a patch that brings it upto date?

            People

            • Assignee:
              Joe Crobak
              Reporter:
              Joe Crobak
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development