Details

    • Type: Sub-task Sub-task
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.0.0-alpha
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Currently in the HDFS-1073 branch, the code for creating output streams is in FileJournalManager and the code for input streams is in the inspectors. This change does a number of things.

      • Input and Output streams are now created by the JournalManager.
      • FSImageStorageInspectors now deals with URIs when referring to edit logs
      • Recovery of inprogress logs is performed by counting the number of transactions instead of looking at the length of the file.

      The patch for this applies on top of the HDFS-1073 branch + HDFS-2003 patch.

      1. HDFS-2018.diff
        115 kB
        Ivan Kelly
      2. HDFS-2018.diff
        115 kB
        Ivan Kelly
      3. HDFS-2018.diff
        106 kB
        Ivan Kelly
      4. HDFS-2018.diff
        106 kB
        Ivan Kelly
      5. HDFS-2018.diff
        106 kB
        Ivan Kelly
      6. HDFS-2018.diff
        106 kB
        Ivan Kelly
      7. HDFS-2018.diff
        106 kB
        Ivan Kelly
      8. HDFS-2018.diff
        105 kB
        Ivan Kelly
      9. HDFS-2018.diff
        105 kB
        Ivan Kelly
      10. HDFS-2018.diff
        104 kB
        Ivan Kelly
      11. HDFS-2018.diff
        89 kB
        Ivan Kelly
      12. HDFS-2018.diff
        149 kB
        Ivan Kelly
      13. HDFS-2018.diff
        149 kB
        Ivan Kelly
      14. HDFS-2018.diff
        137 kB
        Ivan Kelly
      15. HDFS-2018.diff
        121 kB
        Ivan Kelly
      16. HDFS-2018.diff
        121 kB
        Ivan Kelly
      17. HDFS-2018.diff
        125 kB
        Ivan Kelly
      18. HDFS-2018.diff
        101 kB
        Ivan Kelly
      19. HDFS-2018.diff
        101 kB
        Ivan Kelly
      20. HDFS-2018.diff
        101 kB
        Ivan Kelly
      21. HDFS-2018.diff
        80 kB
        Ivan Kelly
      22. HDFS-2018.diff
        87 kB
        Ivan Kelly
      23. HDFS-2018.diff
        87 kB
        Ivan Kelly
      24. hdfs-2018.txt
        106 kB
        Todd Lipcon
      25. hdfs-2018-otherapi.txt
        102 kB
        Todd Lipcon

        Issue Links

          Activity

          Hide
          Tsz Wo Nicholas Sze added a comment -

          I have merged this to 0.23.

          Show
          Tsz Wo Nicholas Sze added a comment - I have merged this to 0.23.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk #809 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/809/)
          HDFS-2018. Move all journal stream management code into one place. Contributed by Ivan Kelly.

          jitendra : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1165826
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/BackupImage.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/BackupJournalManager.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/Checkpointer.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EditLogBackupInputStream.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EditLogFileInputStream.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EditLogInputStream.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLog.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImagePreTransactionalStorageInspector.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageStorageInspector.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageTransactionalStorageInspector.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FileJournalManager.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/JournalManager.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSRollback.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/FSImageTestUtil.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestCheckPointForSecurityTokens.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestEditLog.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestEditLogFileOutputStream.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSEditLogLoader.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSImageStorageInspector.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFileJournalManager.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameEditsConfigs.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #809 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/809/ ) HDFS-2018 . Move all journal stream management code into one place. Contributed by Ivan Kelly. jitendra : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1165826 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/BackupImage.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/BackupJournalManager.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/Checkpointer.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EditLogBackupInputStream.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EditLogFileInputStream.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EditLogInputStream.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLog.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImagePreTransactionalStorageInspector.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageStorageInspector.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageTransactionalStorageInspector.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FileJournalManager.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/JournalManager.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSRollback.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/FSImageTestUtil.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestCheckPointForSecurityTokens.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestEditLog.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestEditLogFileOutputStream.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSEditLogLoader.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSImageStorageInspector.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFileJournalManager.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameEditsConfigs.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk #786 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/786/)
          HDFS-2018. Move all journal stream management code into one place. Contributed by Ivan Kelly.

          jitendra : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1165826
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/BackupImage.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/BackupJournalManager.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/Checkpointer.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EditLogBackupInputStream.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EditLogFileInputStream.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EditLogInputStream.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLog.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImagePreTransactionalStorageInspector.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageStorageInspector.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageTransactionalStorageInspector.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FileJournalManager.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/JournalManager.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSRollback.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/FSImageTestUtil.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestCheckPointForSecurityTokens.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestEditLog.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestEditLogFileOutputStream.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSEditLogLoader.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSImageStorageInspector.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFileJournalManager.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameEditsConfigs.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #786 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/786/ ) HDFS-2018 . Move all journal stream management code into one place. Contributed by Ivan Kelly. jitendra : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1165826 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/BackupImage.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/BackupJournalManager.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/Checkpointer.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EditLogBackupInputStream.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EditLogFileInputStream.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EditLogInputStream.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLog.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImagePreTransactionalStorageInspector.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageStorageInspector.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageTransactionalStorageInspector.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FileJournalManager.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/JournalManager.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSRollback.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/FSImageTestUtil.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestCheckPointForSecurityTokens.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestEditLog.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestEditLogFileOutputStream.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSEditLogLoader.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSImageStorageInspector.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFileJournalManager.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameEditsConfigs.java
          Hide
          Jitendra Nath Pandey added a comment -

          Committed. Thanks to Ivan for the patch and Todd for reviews!

          Show
          Jitendra Nath Pandey added a comment - Committed. Thanks to Ivan for the patch and Todd for reviews!
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk-Commit #847 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/847/)
          HDFS-2018. Move all journal stream management code into one place. Contributed by Ivan Kelly.

          jitendra : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1165826
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/BackupImage.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/BackupJournalManager.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/Checkpointer.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EditLogBackupInputStream.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EditLogFileInputStream.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EditLogInputStream.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLog.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImagePreTransactionalStorageInspector.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageStorageInspector.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageTransactionalStorageInspector.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FileJournalManager.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/JournalManager.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSRollback.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/FSImageTestUtil.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestCheckPointForSecurityTokens.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestEditLog.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestEditLogFileOutputStream.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSEditLogLoader.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSImageStorageInspector.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFileJournalManager.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameEditsConfigs.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #847 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/847/ ) HDFS-2018 . Move all journal stream management code into one place. Contributed by Ivan Kelly. jitendra : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1165826 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/BackupImage.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/BackupJournalManager.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/Checkpointer.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EditLogBackupInputStream.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EditLogFileInputStream.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EditLogInputStream.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLog.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImagePreTransactionalStorageInspector.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageStorageInspector.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageTransactionalStorageInspector.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FileJournalManager.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/JournalManager.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSRollback.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/FSImageTestUtil.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestCheckPointForSecurityTokens.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestEditLog.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestEditLogFileOutputStream.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSEditLogLoader.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSImageStorageInspector.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFileJournalManager.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameEditsConfigs.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk-Commit #913 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/913/)
          HDFS-2018. Move all journal stream management code into one place. Contributed by Ivan Kelly.

          jitendra : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1165826
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/BackupImage.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/BackupJournalManager.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/Checkpointer.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EditLogBackupInputStream.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EditLogFileInputStream.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EditLogInputStream.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLog.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImagePreTransactionalStorageInspector.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageStorageInspector.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageTransactionalStorageInspector.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FileJournalManager.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/JournalManager.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSRollback.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/FSImageTestUtil.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestCheckPointForSecurityTokens.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestEditLog.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestEditLogFileOutputStream.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSEditLogLoader.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSImageStorageInspector.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFileJournalManager.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameEditsConfigs.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #913 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/913/ ) HDFS-2018 . Move all journal stream management code into one place. Contributed by Ivan Kelly. jitendra : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1165826 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/BackupImage.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/BackupJournalManager.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/Checkpointer.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EditLogBackupInputStream.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EditLogFileInputStream.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EditLogInputStream.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLog.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImagePreTransactionalStorageInspector.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageStorageInspector.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageTransactionalStorageInspector.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FileJournalManager.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/JournalManager.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSRollback.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/FSImageTestUtil.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestCheckPointForSecurityTokens.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestEditLog.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestEditLogFileOutputStream.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSEditLogLoader.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSImageStorageInspector.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFileJournalManager.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameEditsConfigs.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Common-trunk-Commit #836 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/836/)
          HDFS-2018. Move all journal stream management code into one place. Contributed by Ivan Kelly.

          jitendra : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1165826
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/BackupImage.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/BackupJournalManager.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/Checkpointer.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EditLogBackupInputStream.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EditLogFileInputStream.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EditLogInputStream.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLog.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImagePreTransactionalStorageInspector.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageStorageInspector.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageTransactionalStorageInspector.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FileJournalManager.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/JournalManager.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSRollback.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/FSImageTestUtil.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestCheckPointForSecurityTokens.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestEditLog.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestEditLogFileOutputStream.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSEditLogLoader.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSImageStorageInspector.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFileJournalManager.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameEditsConfigs.java
          Show
          Hudson added a comment - Integrated in Hadoop-Common-trunk-Commit #836 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/836/ ) HDFS-2018 . Move all journal stream management code into one place. Contributed by Ivan Kelly. jitendra : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1165826 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/BackupImage.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/BackupJournalManager.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/Checkpointer.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EditLogBackupInputStream.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EditLogFileInputStream.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EditLogInputStream.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLog.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImagePreTransactionalStorageInspector.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageStorageInspector.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageTransactionalStorageInspector.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FileJournalManager.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/JournalManager.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSRollback.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/FSImageTestUtil.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestCheckPointForSecurityTokens.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestEditLog.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestEditLogFileOutputStream.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSEditLogLoader.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSImageStorageInspector.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFileJournalManager.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameEditsConfigs.java
          Hide
          Ivan Kelly added a comment -

          BackupNode now fixed on trunk (HDFS-2310). TestDfsOverAvroRpc, TestHost2NodesMap & TestReplicasMap are 45, 61 & 61 builds old respectively.

          Show
          Ivan Kelly added a comment - BackupNode now fixed on trunk ( HDFS-2310 ). TestDfsOverAvroRpc, TestHost2NodesMap & TestReplicasMap are 45, 61 & 61 builds old respectively.
          Hide
          Ivan Kelly added a comment -

          BackupNode failure occurs in trunk also.

          Show
          Ivan Kelly added a comment - BackupNode failure occurs in trunk also.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12493011/HDFS-2018.diff
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 28 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          -1 release audit. The applied patch generated 1 release audit warnings (more than the trunk's current 0 warnings).

          -1 core tests. The patch failed these unit tests:

          org.apache.hadoop.hdfs.server.namenode.TestBackupNode
          org.apache.hadoop.hdfs.TestDfsOverAvroRpc
          org.apache.hadoop.hdfs.server.blockmanagement.TestHost2NodesMap
          org.apache.hadoop.hdfs.server.datanode.TestReplicasMap

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/1193//testReport/
          Release audit warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/1193//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/1193//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1193//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12493011/HDFS-2018.diff against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 28 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. -1 release audit. The applied patch generated 1 release audit warnings (more than the trunk's current 0 warnings). -1 core tests. The patch failed these unit tests: org.apache.hadoop.hdfs.server.namenode.TestBackupNode org.apache.hadoop.hdfs.TestDfsOverAvroRpc org.apache.hadoop.hdfs.server.blockmanagement.TestHost2NodesMap org.apache.hadoop.hdfs.server.datanode.TestReplicasMap +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/1193//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/1193//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/1193//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1193//console This message is automatically generated.
          Hide
          Ivan Kelly added a comment -

          Update to take FSConstants->HdfsConstants change into account.

          Show
          Ivan Kelly added a comment - Update to take FSConstants->HdfsConstants change into account.
          Hide
          Todd Lipcon added a comment -

          OK. I still think it's a worse API than the other patch I had attached a while back, for the reasons mentioned above, but I won't block it. Go ahead and commit.

          Show
          Todd Lipcon added a comment - OK. I still think it's a worse API than the other patch I had attached a while back, for the reasons mentioned above, but I won't block it. Go ahead and commit.
          Hide
          Jitendra Nath Pandey added a comment -

          Todd, are you ok with committing this now? The patch is inline with the what we had agreed before.

          Show
          Jitendra Nath Pandey added a comment - Todd, are you ok with committing this now? The patch is inline with the what we had agreed before.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12492616/HDFS-2018.diff
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 28 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:

          org.apache.hadoop.hdfs.TestDfsOverAvroRpc
          org.apache.hadoop.hdfs.server.blockmanagement.TestHost2NodesMap
          org.apache.hadoop.hdfs.server.datanode.TestReplicasMap
          org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/1185//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/1185//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1185//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12492616/HDFS-2018.diff against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 28 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hdfs.TestDfsOverAvroRpc org.apache.hadoop.hdfs.server.blockmanagement.TestHost2NodesMap org.apache.hadoop.hdfs.server.datanode.TestReplicasMap org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/1185//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/1185//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1185//console This message is automatically generated.
          Hide
          jiraposter@reviews.apache.org added a comment -

          On 2011-08-31 19:53:36, Todd Lipcon wrote:

          > In doing this review, I remembered some reasoning about why I wanted to introduce the EditLogReference interface:

          >

          > 1) The lifecycle of the open edit streams is now unbalanced – the streams are all opened in one place and then closed far away in loadEdits() later. If there's an error in one of the middle streams, it will leak the open files of the other streams that come after it. Introducing EditLogReference (a reference to a stream that's not opened yet) means that loadEdits(EditLogReference) has the whole lifecycle of an open stream.

          > 2) The old implementation had true unit tests for the various gap scenarios, which I found easy to read. The new tests are much harder to read, with the "AbortSpec" stuff, plus it relies on actually creating real files on disk, rather than mock objects. Having the FileJournalManager only deal with references in the "planning" state, and moving the validation to the reference, means we can test all of this code with only mocks.

          >

          > Specific comments on the implementation below.

          1) This uses a factory pattern where it's quite common for allocation and closing to be in different parts of the code. I've addressed the leaked streams. The problem here is that we're selecting all streams before we use them. In the original patch, #loadEdits() would select the logs as it went along. Another solution would be to add an #open() method to EditLogInputStream, which would be similar to the reference approach, though seems strange as streams are generally opened on instantiation in java.

          2) Which scenarios in particular are you referring to?

          On 2011-08-31 19:53:36, Todd Lipcon wrote:

          > hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/BackupImage.java, line 270

          > <https://reviews.apache.org/r/1688/diff/1/?file=36984#file36984line270>

          >

          > If you pass "target - 1" above, then why do we have to explicitly remove it again? It already should be excluded here.

          The "target - 1" passed to selectInputStreams means that it should throw an exception if it doesn't find enough transactions to get to at least target - 1. Its not a limit though, selectInputStreams will try to return all streams which can serve transactions from lastAppliedTxId onwards.

          On 2011-08-31 19:53:36, Todd Lipcon wrote:

          > hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/BackupImage.java, lines 292-296

          > <https://reviews.apache.org/r/1688/diff/1/?file=36984#file36984line292>

          >

          > shouldn't there be exactly 1 stream here?

          >

          > What happens if you don't find any? there should be a check for stream == null

          Yes, exactly 1.

          I've added a check now. I was thinking of adding an assert (the synchronization should ensure a stream exists) but added a LOG.warn instead, as a retry would likely recover the situation anyhow.

          On 2011-08-31 19:53:36, Todd Lipcon wrote:

          > hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/BackupJournalManager.java, line 64

          > <https://reviews.apache.org/r/1688/diff/1/?file=36985#file36985line64>

          >

          > add a comment that this JM is never used for input, hence it can never provide transactions?

          done

          On 2011-08-31 19:53:36, Todd Lipcon wrote:

          > hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLog.java, line 1072

          > <https://reviews.apache.org/r/1688/diff/1/?file=36990#file36990line1072>

          >

          > adjust javadoc to explain what "best" means. In this case, it's the one who has the most transactions readable after that txid

          done

          On 2011-08-31 19:53:36, Todd Lipcon wrote:

          > hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLog.java, line 1088

          > <https://reviews.apache.org/r/1688/diff/1/?file=36990#file36990line1088>

          >

          > should log a warning

          done

          On 2011-08-31 19:53:36, Todd Lipcon wrote:

          > hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLog.java, line 1132

          > <https://reviews.apache.org/r/1688/diff/1/?file=36990#file36990line1132>

          >

          > this error path leaks streams

          done

          On 2011-08-31 19:53:36, Todd Lipcon wrote:

          > hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java, line 491

          > <https://reviews.apache.org/r/1688/diff/1/?file=36991#file36991line491>

          >

          > while we're at it, we can add a good check here that lastTxId == INVALID_TXID || op.txid == lastTxid + 1 - log an ERROR and break the loop if that's hit, I think?

          done

          On 2011-08-31 19:53:36, Todd Lipcon wrote:

          > hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FileJournalManager.java, lines 86-89

          > <https://reviews.apache.org/r/1688/diff/1/?file=36996#file36996line86>

          >

          > I don't think this logic makes sense - in the case of a journal manager going away and coming back inside the BN or SBN, you may want to finalize old segments while writing to a new one. That is to say, this should be independent of currentInProgress.

          done

          On 2011-08-31 19:53:36, Todd Lipcon wrote:

          > hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FileJournalManager.java, line 233

          > <https://reviews.apache.org/r/1688/diff/1/?file=36996#file36996line233>

          >

          > warning message should include the storage dir

          done

          On 2011-08-31 19:53:36, Todd Lipcon wrote:

          > hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FileJournalManager.java, line 234

          > <https://reviews.apache.org/r/1688/diff/1/?file=36996#file36996line234>

          >

          > don't you also need to break, here? Otherwise you will improperly log all larger logs as gaps

          done

          On 2011-08-31 19:53:36, Todd Lipcon wrote:

          > hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FileJournalManager.java, lines 302-305

          > <https://reviews.apache.org/r/1688/diff/1/?file=36996#file36996line302>

          >

          > what's the purpose of this helper function?

          it was useful when there was caching going on. I've removed it now and made the validateLog call direct.

          On 2011-08-31 19:53:36, Todd Lipcon wrote:

          > hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FileJournalManager.java, line 374

          > <https://reviews.apache.org/r/1688/diff/1/?file=36996#file36996line374>

          >

          > is this ever called?

          nope, removed.

          On 2011-08-31 19:53:36, Todd Lipcon wrote:

          > hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FileJournalManager.java, line 378

          > <https://reviews.apache.org/r/1688/diff/1/?file=36996#file36996line378>

          >

          > Add javadoc that this mutates the structure - it's not clear from the name of the function.

          done. Also removed the EditLogValidation being returned, as thats never used.

          On 2011-08-31 19:53:36, Todd Lipcon wrote:

          > hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/JournalManager.java, line 80

          > <https://reviews.apache.org/r/1688/diff/1/?file=36997#file36997line80>

          >

          > please use American spelling to match the rest of Hadoop

          done

          On 2011-08-31 19:53:36, Todd Lipcon wrote:

          > hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestEditLog.java, lines 669-673

          > <https://reviews.apache.org/r/1688/diff/1/?file=37001#file37001line669>

          >

          > What would happen if the txid file hadn't been updated? Can we add a test for the case where we have an out-of-date seen_txid file, but we do have new logs? i.e:

          >

          > fsimage_0

          > edits_1-10

          > edits_inprogress_11 (which 5 txns)

          > seen_txid = 0

          >

          > Do we properly read all 15 txns? Do we have enough safeguards that we won't overwrite edits or do anything overlapping?

          >

          > In the current implementation (ie pre-patch), it would see the newer logs and know that it should attempt to read them. In this implementation, I think it might overwrite one of the segments.

          >

          > Please add a test case for this.

          >

          >

          >

          I've added a test for this. If the seen_txid is out of date(I guess this happens if a NN crashes after creating a segment, but before the update) the NN still starts up. I've reinstated CorruptionException to take care of this. And added new variants of doTestCrashRecoverEmptyLog to check it.

          • Ivan

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/1688/#review1703
          -----------------------------------------------------------

          On 2011-08-31 19:04:11, Todd Lipcon wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/1688/

          -----------------------------------------------------------

          (Updated 2011-08-31 19:04:11)

          Review request for Todd Lipcon.

          Summary

          -------

          Posting patch on review-board for easy review

          This addresses bug HDFS-2018.

          http://issues.apache.org/jira/browse/HDFS-2018

          Diffs

          -----

          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/BackupImage.java d72509c

          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/BackupJournalManager.java 3cac667

          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/Checkpointer.java f754100

          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EditLogBackupInputStream.java 8921bc0

          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EditLogFileInputStream.java 532b2f2

          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EditLogInputStream.java 52a3dd4

          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLog.java 495c42e

          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java db98569

          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java 8b25901

          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImagePreTransactionalStorageInspector.java cec2eef

          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageStorageInspector.java 65bfa0a

          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageTransactionalStorageInspector.java 0814a14

          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FileJournalManager.java 991d7f5

          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/JournalManager.java d62aaa7

          hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSRollback.java 511e9c1

          hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/FSImageTestUtil.java 4a8edb8

          hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestCheckPointForSecurityTokens.java 4f698c0

          hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestEditLog.java 9e55946

          hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestEditLogFileOutputStream.java a673c5f

          hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSEditLogLoader.java a53a0bf

          hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSImageStorageInspector.java 113dcbc

          hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFileJournalManager.java 748caf4

          hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameEditsConfigs.java 31a7777

          Diff: https://reviews.apache.org/r/1688/diff

          Testing

          -------

          Thanks,

          Todd

          Show
          jiraposter@reviews.apache.org added a comment - On 2011-08-31 19:53:36, Todd Lipcon wrote: > In doing this review, I remembered some reasoning about why I wanted to introduce the EditLogReference interface: > > 1) The lifecycle of the open edit streams is now unbalanced – the streams are all opened in one place and then closed far away in loadEdits() later. If there's an error in one of the middle streams, it will leak the open files of the other streams that come after it. Introducing EditLogReference (a reference to a stream that's not opened yet) means that loadEdits(EditLogReference) has the whole lifecycle of an open stream. > 2) The old implementation had true unit tests for the various gap scenarios, which I found easy to read. The new tests are much harder to read, with the "AbortSpec" stuff, plus it relies on actually creating real files on disk, rather than mock objects. Having the FileJournalManager only deal with references in the "planning" state, and moving the validation to the reference, means we can test all of this code with only mocks. > > Specific comments on the implementation below. 1) This uses a factory pattern where it's quite common for allocation and closing to be in different parts of the code. I've addressed the leaked streams. The problem here is that we're selecting all streams before we use them. In the original patch, #loadEdits() would select the logs as it went along. Another solution would be to add an #open() method to EditLogInputStream, which would be similar to the reference approach, though seems strange as streams are generally opened on instantiation in java. 2) Which scenarios in particular are you referring to? On 2011-08-31 19:53:36, Todd Lipcon wrote: > hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/BackupImage.java, line 270 > < https://reviews.apache.org/r/1688/diff/1/?file=36984#file36984line270 > > > If you pass "target - 1" above, then why do we have to explicitly remove it again? It already should be excluded here. The "target - 1" passed to selectInputStreams means that it should throw an exception if it doesn't find enough transactions to get to at least target - 1. Its not a limit though, selectInputStreams will try to return all streams which can serve transactions from lastAppliedTxId onwards. On 2011-08-31 19:53:36, Todd Lipcon wrote: > hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/BackupImage.java, lines 292-296 > < https://reviews.apache.org/r/1688/diff/1/?file=36984#file36984line292 > > > shouldn't there be exactly 1 stream here? > > What happens if you don't find any? there should be a check for stream == null Yes, exactly 1. I've added a check now. I was thinking of adding an assert (the synchronization should ensure a stream exists) but added a LOG.warn instead, as a retry would likely recover the situation anyhow. On 2011-08-31 19:53:36, Todd Lipcon wrote: > hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/BackupJournalManager.java, line 64 > < https://reviews.apache.org/r/1688/diff/1/?file=36985#file36985line64 > > > add a comment that this JM is never used for input, hence it can never provide transactions? done On 2011-08-31 19:53:36, Todd Lipcon wrote: > hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLog.java, line 1072 > < https://reviews.apache.org/r/1688/diff/1/?file=36990#file36990line1072 > > > adjust javadoc to explain what "best" means. In this case, it's the one who has the most transactions readable after that txid done On 2011-08-31 19:53:36, Todd Lipcon wrote: > hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLog.java, line 1088 > < https://reviews.apache.org/r/1688/diff/1/?file=36990#file36990line1088 > > > should log a warning done On 2011-08-31 19:53:36, Todd Lipcon wrote: > hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLog.java, line 1132 > < https://reviews.apache.org/r/1688/diff/1/?file=36990#file36990line1132 > > > this error path leaks streams done On 2011-08-31 19:53:36, Todd Lipcon wrote: > hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java, line 491 > < https://reviews.apache.org/r/1688/diff/1/?file=36991#file36991line491 > > > while we're at it, we can add a good check here that lastTxId == INVALID_TXID || op.txid == lastTxid + 1 - log an ERROR and break the loop if that's hit, I think? done On 2011-08-31 19:53:36, Todd Lipcon wrote: > hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FileJournalManager.java, lines 86-89 > < https://reviews.apache.org/r/1688/diff/1/?file=36996#file36996line86 > > > I don't think this logic makes sense - in the case of a journal manager going away and coming back inside the BN or SBN, you may want to finalize old segments while writing to a new one. That is to say, this should be independent of currentInProgress. done On 2011-08-31 19:53:36, Todd Lipcon wrote: > hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FileJournalManager.java, line 233 > < https://reviews.apache.org/r/1688/diff/1/?file=36996#file36996line233 > > > warning message should include the storage dir done On 2011-08-31 19:53:36, Todd Lipcon wrote: > hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FileJournalManager.java, line 234 > < https://reviews.apache.org/r/1688/diff/1/?file=36996#file36996line234 > > > don't you also need to break, here? Otherwise you will improperly log all larger logs as gaps done On 2011-08-31 19:53:36, Todd Lipcon wrote: > hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FileJournalManager.java, lines 302-305 > < https://reviews.apache.org/r/1688/diff/1/?file=36996#file36996line302 > > > what's the purpose of this helper function? it was useful when there was caching going on. I've removed it now and made the validateLog call direct. On 2011-08-31 19:53:36, Todd Lipcon wrote: > hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FileJournalManager.java, line 374 > < https://reviews.apache.org/r/1688/diff/1/?file=36996#file36996line374 > > > is this ever called? nope, removed. On 2011-08-31 19:53:36, Todd Lipcon wrote: > hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FileJournalManager.java, line 378 > < https://reviews.apache.org/r/1688/diff/1/?file=36996#file36996line378 > > > Add javadoc that this mutates the structure - it's not clear from the name of the function. done. Also removed the EditLogValidation being returned, as thats never used. On 2011-08-31 19:53:36, Todd Lipcon wrote: > hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/JournalManager.java, line 80 > < https://reviews.apache.org/r/1688/diff/1/?file=36997#file36997line80 > > > please use American spelling to match the rest of Hadoop done On 2011-08-31 19:53:36, Todd Lipcon wrote: > hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestEditLog.java, lines 669-673 > < https://reviews.apache.org/r/1688/diff/1/?file=37001#file37001line669 > > > What would happen if the txid file hadn't been updated? Can we add a test for the case where we have an out-of-date seen_txid file, but we do have new logs? i.e: > > fsimage_0 > edits_1-10 > edits_inprogress_11 (which 5 txns) > seen_txid = 0 > > Do we properly read all 15 txns? Do we have enough safeguards that we won't overwrite edits or do anything overlapping? > > In the current implementation (ie pre-patch), it would see the newer logs and know that it should attempt to read them. In this implementation, I think it might overwrite one of the segments. > > Please add a test case for this. > > > I've added a test for this. If the seen_txid is out of date(I guess this happens if a NN crashes after creating a segment, but before the update) the NN still starts up. I've reinstated CorruptionException to take care of this. And added new variants of doTestCrashRecoverEmptyLog to check it. Ivan ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1688/#review1703 ----------------------------------------------------------- On 2011-08-31 19:04:11, Todd Lipcon wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1688/ ----------------------------------------------------------- (Updated 2011-08-31 19:04:11) Review request for Todd Lipcon. Summary ------- Posting patch on review-board for easy review This addresses bug HDFS-2018 . http://issues.apache.org/jira/browse/HDFS-2018 Diffs ----- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/BackupImage.java d72509c hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/BackupJournalManager.java 3cac667 hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/Checkpointer.java f754100 hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EditLogBackupInputStream.java 8921bc0 hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EditLogFileInputStream.java 532b2f2 hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EditLogInputStream.java 52a3dd4 hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLog.java 495c42e hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java db98569 hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java 8b25901 hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImagePreTransactionalStorageInspector.java cec2eef hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageStorageInspector.java 65bfa0a hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageTransactionalStorageInspector.java 0814a14 hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FileJournalManager.java 991d7f5 hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/JournalManager.java d62aaa7 hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSRollback.java 511e9c1 hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/FSImageTestUtil.java 4a8edb8 hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestCheckPointForSecurityTokens.java 4f698c0 hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestEditLog.java 9e55946 hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestEditLogFileOutputStream.java a673c5f hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSEditLogLoader.java a53a0bf hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSImageStorageInspector.java 113dcbc hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFileJournalManager.java 748caf4 hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameEditsConfigs.java 31a7777 Diff: https://reviews.apache.org/r/1688/diff Testing ------- Thanks, Todd
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/1688/#review1703
          -----------------------------------------------------------

          In doing this review, I remembered some reasoning about why I wanted to introduce the EditLogReference interface:

          1) The lifecycle of the open edit streams is now unbalanced – the streams are all opened in one place and then closed far away in loadEdits() later. If there's an error in one of the middle streams, it will leak the open files of the other streams that come after it. Introducing EditLogReference (a reference to a stream that's not opened yet) means that loadEdits(EditLogReference) has the whole lifecycle of an open stream.
          2) The old implementation had true unit tests for the various gap scenarios, which I found easy to read. The new tests are much harder to read, with the "AbortSpec" stuff, plus it relies on actually creating real files on disk, rather than mock objects. Having the FileJournalManager only deal with references in the "planning" state, and moving the validation to the reference, means we can test all of this code with only mocks.

          Specific comments on the implementation below.

          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/BackupImage.java
          <https://reviews.apache.org/r/1688/#comment3858>

          If you pass "target - 1" above, then why do we have to explicitly remove it again? It already should be excluded here.

          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/BackupImage.java
          <https://reviews.apache.org/r/1688/#comment3859>

          shouldn't there be exactly 1 stream here?

          What happens if you don't find any? there should be a check for stream == null

          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/BackupJournalManager.java
          <https://reviews.apache.org/r/1688/#comment3860>

          add a comment that this JM is never used for input, hence it can never provide transactions?

          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLog.java
          <https://reviews.apache.org/r/1688/#comment3862>

          adjust javadoc to explain what "best" means. In this case, it's the one who has the most transactions readable after that txid

          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLog.java
          <https://reviews.apache.org/r/1688/#comment3861>

          should log a warning

          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLog.java
          <https://reviews.apache.org/r/1688/#comment3863>

          this error path leaks streams

          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java
          <https://reviews.apache.org/r/1688/#comment3864>

          while we're at it, we can add a good check here that lastTxId == INVALID_TXID || op.txid == lastTxid + 1 - log an ERROR and break the loop if that's hit, I think?

          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FileJournalManager.java
          <https://reviews.apache.org/r/1688/#comment3865>

          I don't think this logic makes sense - in the case of a journal manager going away and coming back inside the BN or SBN, you may want to finalize old segments while writing to a new one. That is to say, this should be independent of currentInProgress.

          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FileJournalManager.java
          <https://reviews.apache.org/r/1688/#comment3866>

          warning message should include the storage dir

          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FileJournalManager.java
          <https://reviews.apache.org/r/1688/#comment3867>

          don't you also need to break, here? Otherwise you will improperly log all larger logs as gaps

          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FileJournalManager.java
          <https://reviews.apache.org/r/1688/#comment3868>

          what's the purpose of this helper function?

          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FileJournalManager.java
          <https://reviews.apache.org/r/1688/#comment3869>

          is this ever called?

          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FileJournalManager.java
          <https://reviews.apache.org/r/1688/#comment3870>

          Add javadoc that this mutates the structure - it's not clear from the name of the function.

          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/JournalManager.java
          <https://reviews.apache.org/r/1688/#comment3871>

          please use American spelling to match the rest of Hadoop

          hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestEditLog.java
          <https://reviews.apache.org/r/1688/#comment3872>

          What would happen if the txid file hadn't been updated? Can we add a test for the case where we have an out-of-date seen_txid file, but we do have new logs? i.e:

          fsimage_0
          edits_1-10
          edits_inprogress_11 (which 5 txns)
          seen_txid = 0

          Do we properly read all 15 txns? Do we have enough safeguards that we won't overwrite edits or do anything overlapping?

          In the current implementation (ie pre-patch), it would see the newer logs and know that it should attempt to read them. In this implementation, I think it might overwrite one of the segments.

          Please add a test case for this.

          • Todd

          On 2011-08-31 19:04:11, Todd Lipcon wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/1688/

          -----------------------------------------------------------

          (Updated 2011-08-31 19:04:11)

          Review request for Todd Lipcon.

          Summary

          -------

          Posting patch on review-board for easy review

          This addresses bug HDFS-2018.

          http://issues.apache.org/jira/browse/HDFS-2018

          Diffs

          -----

          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/BackupImage.java d72509c

          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/BackupJournalManager.java 3cac667

          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/Checkpointer.java f754100

          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EditLogBackupInputStream.java 8921bc0

          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EditLogFileInputStream.java 532b2f2

          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EditLogInputStream.java 52a3dd4

          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLog.java 495c42e

          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java db98569

          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java 8b25901

          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImagePreTransactionalStorageInspector.java cec2eef

          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageStorageInspector.java 65bfa0a

          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageTransactionalStorageInspector.java 0814a14

          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FileJournalManager.java 991d7f5

          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/JournalManager.java d62aaa7

          hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSRollback.java 511e9c1

          hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/FSImageTestUtil.java 4a8edb8

          hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestCheckPointForSecurityTokens.java 4f698c0

          hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestEditLog.java 9e55946

          hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestEditLogFileOutputStream.java a673c5f

          hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSEditLogLoader.java a53a0bf

          hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSImageStorageInspector.java 113dcbc

          hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFileJournalManager.java 748caf4

          hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameEditsConfigs.java 31a7777

          Diff: https://reviews.apache.org/r/1688/diff

          Testing

          -------

          Thanks,

          Todd

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1688/#review1703 ----------------------------------------------------------- In doing this review, I remembered some reasoning about why I wanted to introduce the EditLogReference interface: 1) The lifecycle of the open edit streams is now unbalanced – the streams are all opened in one place and then closed far away in loadEdits() later. If there's an error in one of the middle streams, it will leak the open files of the other streams that come after it. Introducing EditLogReference (a reference to a stream that's not opened yet) means that loadEdits(EditLogReference) has the whole lifecycle of an open stream. 2) The old implementation had true unit tests for the various gap scenarios, which I found easy to read. The new tests are much harder to read, with the "AbortSpec" stuff, plus it relies on actually creating real files on disk, rather than mock objects. Having the FileJournalManager only deal with references in the "planning" state, and moving the validation to the reference, means we can test all of this code with only mocks. Specific comments on the implementation below. hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/BackupImage.java < https://reviews.apache.org/r/1688/#comment3858 > If you pass "target - 1" above, then why do we have to explicitly remove it again? It already should be excluded here. hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/BackupImage.java < https://reviews.apache.org/r/1688/#comment3859 > shouldn't there be exactly 1 stream here? What happens if you don't find any? there should be a check for stream == null hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/BackupJournalManager.java < https://reviews.apache.org/r/1688/#comment3860 > add a comment that this JM is never used for input, hence it can never provide transactions? hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLog.java < https://reviews.apache.org/r/1688/#comment3862 > adjust javadoc to explain what "best" means. In this case, it's the one who has the most transactions readable after that txid hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLog.java < https://reviews.apache.org/r/1688/#comment3861 > should log a warning hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLog.java < https://reviews.apache.org/r/1688/#comment3863 > this error path leaks streams hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java < https://reviews.apache.org/r/1688/#comment3864 > while we're at it, we can add a good check here that lastTxId == INVALID_TXID || op.txid == lastTxid + 1 - log an ERROR and break the loop if that's hit, I think? hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FileJournalManager.java < https://reviews.apache.org/r/1688/#comment3865 > I don't think this logic makes sense - in the case of a journal manager going away and coming back inside the BN or SBN, you may want to finalize old segments while writing to a new one. That is to say, this should be independent of currentInProgress. hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FileJournalManager.java < https://reviews.apache.org/r/1688/#comment3866 > warning message should include the storage dir hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FileJournalManager.java < https://reviews.apache.org/r/1688/#comment3867 > don't you also need to break, here? Otherwise you will improperly log all larger logs as gaps hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FileJournalManager.java < https://reviews.apache.org/r/1688/#comment3868 > what's the purpose of this helper function? hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FileJournalManager.java < https://reviews.apache.org/r/1688/#comment3869 > is this ever called? hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FileJournalManager.java < https://reviews.apache.org/r/1688/#comment3870 > Add javadoc that this mutates the structure - it's not clear from the name of the function. hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/JournalManager.java < https://reviews.apache.org/r/1688/#comment3871 > please use American spelling to match the rest of Hadoop hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestEditLog.java < https://reviews.apache.org/r/1688/#comment3872 > What would happen if the txid file hadn't been updated? Can we add a test for the case where we have an out-of-date seen_txid file, but we do have new logs? i.e: fsimage_0 edits_1-10 edits_inprogress_11 (which 5 txns) seen_txid = 0 Do we properly read all 15 txns? Do we have enough safeguards that we won't overwrite edits or do anything overlapping? In the current implementation (ie pre-patch), it would see the newer logs and know that it should attempt to read them. In this implementation, I think it might overwrite one of the segments. Please add a test case for this. Todd On 2011-08-31 19:04:11, Todd Lipcon wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1688/ ----------------------------------------------------------- (Updated 2011-08-31 19:04:11) Review request for Todd Lipcon. Summary ------- Posting patch on review-board for easy review This addresses bug HDFS-2018 . http://issues.apache.org/jira/browse/HDFS-2018 Diffs ----- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/BackupImage.java d72509c hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/BackupJournalManager.java 3cac667 hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/Checkpointer.java f754100 hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EditLogBackupInputStream.java 8921bc0 hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EditLogFileInputStream.java 532b2f2 hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EditLogInputStream.java 52a3dd4 hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLog.java 495c42e hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java db98569 hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java 8b25901 hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImagePreTransactionalStorageInspector.java cec2eef hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageStorageInspector.java 65bfa0a hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageTransactionalStorageInspector.java 0814a14 hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FileJournalManager.java 991d7f5 hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/JournalManager.java d62aaa7 hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSRollback.java 511e9c1 hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/FSImageTestUtil.java 4a8edb8 hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestCheckPointForSecurityTokens.java 4f698c0 hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestEditLog.java 9e55946 hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestEditLogFileOutputStream.java a673c5f hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSEditLogLoader.java a53a0bf hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSImageStorageInspector.java 113dcbc hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFileJournalManager.java 748caf4 hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameEditsConfigs.java 31a7777 Diff: https://reviews.apache.org/r/1688/diff Testing ------- Thanks, Todd
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/1688/
          -----------------------------------------------------------

          Review request for Todd Lipcon.

          Summary
          -------

          Posting patch on review-board for easy review

          This addresses bug HDFS-2018.
          http://issues.apache.org/jira/browse/HDFS-2018

          Diffs


          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/BackupImage.java d72509c
          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/BackupJournalManager.java 3cac667
          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/Checkpointer.java f754100
          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EditLogBackupInputStream.java 8921bc0
          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EditLogFileInputStream.java 532b2f2
          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EditLogInputStream.java 52a3dd4
          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLog.java 495c42e
          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java db98569
          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java 8b25901
          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImagePreTransactionalStorageInspector.java cec2eef
          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageStorageInspector.java 65bfa0a
          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageTransactionalStorageInspector.java 0814a14
          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FileJournalManager.java 991d7f5
          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/JournalManager.java d62aaa7
          hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSRollback.java 511e9c1
          hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/FSImageTestUtil.java 4a8edb8
          hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestCheckPointForSecurityTokens.java 4f698c0
          hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestEditLog.java 9e55946
          hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestEditLogFileOutputStream.java a673c5f
          hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSEditLogLoader.java a53a0bf
          hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSImageStorageInspector.java 113dcbc
          hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFileJournalManager.java 748caf4
          hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameEditsConfigs.java 31a7777

          Diff: https://reviews.apache.org/r/1688/diff

          Testing
          -------

          Thanks,

          Todd

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1688/ ----------------------------------------------------------- Review request for Todd Lipcon. Summary ------- Posting patch on review-board for easy review This addresses bug HDFS-2018 . http://issues.apache.org/jira/browse/HDFS-2018 Diffs hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/BackupImage.java d72509c hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/BackupJournalManager.java 3cac667 hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/Checkpointer.java f754100 hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EditLogBackupInputStream.java 8921bc0 hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EditLogFileInputStream.java 532b2f2 hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EditLogInputStream.java 52a3dd4 hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLog.java 495c42e hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java db98569 hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java 8b25901 hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImagePreTransactionalStorageInspector.java cec2eef hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageStorageInspector.java 65bfa0a hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageTransactionalStorageInspector.java 0814a14 hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FileJournalManager.java 991d7f5 hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/JournalManager.java d62aaa7 hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSRollback.java 511e9c1 hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/FSImageTestUtil.java 4a8edb8 hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestCheckPointForSecurityTokens.java 4f698c0 hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestEditLog.java 9e55946 hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestEditLogFileOutputStream.java a673c5f hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSEditLogLoader.java a53a0bf hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSImageStorageInspector.java 113dcbc hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFileJournalManager.java 748caf4 hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameEditsConfigs.java 31a7777 Diff: https://reviews.apache.org/r/1688/diff Testing ------- Thanks, Todd
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12492437/HDFS-2018.diff
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 28 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:

          org.apache.hadoop.hdfs.TestDfsOverAvroRpc
          org.apache.hadoop.hdfs.server.blockmanagement.TestHost2NodesMap
          org.apache.hadoop.hdfs.server.datanode.TestReplicasMap
          org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/1181//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/1181//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1181//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12492437/HDFS-2018.diff against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 28 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hdfs.TestDfsOverAvroRpc org.apache.hadoop.hdfs.server.blockmanagement.TestHost2NodesMap org.apache.hadoop.hdfs.server.datanode.TestReplicasMap org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/1181//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/1181//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1181//console This message is automatically generated.
          Hide
          Ivan Kelly added a comment -

          renamed countTransactionInInprogress to validateEditLogFile.

          Show
          Ivan Kelly added a comment - renamed countTransactionInInprogress to validateEditLogFile.
          Hide
          Jitendra Nath Pandey added a comment -

          The latest patch looks good to me.
          Minor comment:
          countTransactionsInInprogress in FileJournalManager is only doing validation, can we rename it to something like validateEditLogFile?

          +1.

          Show
          Jitendra Nath Pandey added a comment - The latest patch looks good to me. Minor comment: countTransactionsInInprogress in FileJournalManager is only doing validation, can we rename it to something like validateEditLogFile? +1.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12492273/HDFS-2018.diff
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 28 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:

          org.apache.hadoop.hdfs.TestDfsOverAvroRpc
          org.apache.hadoop.hdfs.server.blockmanagement.TestHost2NodesMap

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/1178//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/1178//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1178//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12492273/HDFS-2018.diff against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 28 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hdfs.TestDfsOverAvroRpc org.apache.hadoop.hdfs.server.blockmanagement.TestHost2NodesMap +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/1178//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/1178//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1178//console This message is automatically generated.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12492209/HDFS-2018.diff
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 28 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:

          org.apache.hadoop.hdfs.TestDfsOverAvroRpc
          org.apache.hadoop.hdfs.server.blockmanagement.TestHost2NodesMap

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/1177//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/1177//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1177//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12492209/HDFS-2018.diff against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 28 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hdfs.TestDfsOverAvroRpc org.apache.hadoop.hdfs.server.blockmanagement.TestHost2NodesMap +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/1177//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/1177//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1177//console This message is automatically generated.
          Hide
          Ivan Kelly added a comment -

          Had unloaded wrong patch last time with 1 fix missing. This should pass as well as trunk.

          On trunk the following tests have failures;
          TestDfsOverAvroRpc
          TestHost2NodesMap
          TestReplicasMap
          TestOfflineEditsViewer

          Show
          Ivan Kelly added a comment - Had unloaded wrong patch last time with 1 fix missing. This should pass as well as trunk. On trunk the following tests have failures; TestDfsOverAvroRpc TestHost2NodesMap TestReplicasMap TestOfflineEditsViewer
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12492100/HDFS-2018.diff
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 28 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:

          org.apache.hadoop.hdfs.TestDfsOverAvroRpc
          org.apache.hadoop.hdfs.server.namenode.TestNameEditsConfigs
          org.apache.hadoop.hdfs.server.blockmanagement.TestHost2NodesMap
          org.apache.hadoop.hdfs.server.datanode.TestReplicasMap

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/1176//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/1176//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1176//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12492100/HDFS-2018.diff against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 28 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hdfs.TestDfsOverAvroRpc org.apache.hadoop.hdfs.server.namenode.TestNameEditsConfigs org.apache.hadoop.hdfs.server.blockmanagement.TestHost2NodesMap org.apache.hadoop.hdfs.server.datanode.TestReplicasMap +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/1176//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/1176//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1176//console This message is automatically generated.
          Hide
          Ivan Kelly added a comment -

          Fixed up exception problems.

          Show
          Ivan Kelly added a comment - Fixed up exception problems.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12492083/HDFS-2018.diff
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 28 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:

          org.apache.hadoop.hdfs.TestDfsOverAvroRpc
          org.apache.hadoop.hdfs.server.namenode.TestNameEditsConfigs
          org.apache.hadoop.hdfs.server.blockmanagement.TestHost2NodesMap
          org.apache.hadoop.hdfs.TestDFSRollback
          org.apache.hadoop.hdfs.server.datanode.TestReplicasMap
          org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/1175//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/1175//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1175//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12492083/HDFS-2018.diff against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 28 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hdfs.TestDfsOverAvroRpc org.apache.hadoop.hdfs.server.namenode.TestNameEditsConfigs org.apache.hadoop.hdfs.server.blockmanagement.TestHost2NodesMap org.apache.hadoop.hdfs.TestDFSRollback org.apache.hadoop.hdfs.server.datanode.TestReplicasMap org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/1175//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/1175//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1175//console This message is automatically generated.
          Hide
          Ivan Kelly added a comment -

          I've updated this patch to make the recovery step explicit. Also, I've removed the CorruptionException. From the point of view of FSEditLog it doesn't matter if a JournalManager has gaps as long and it can get enough transactions to reach the max seen txid as stored in max_seen. I've not run the commit test on this because I need to figure out how to do that with mvn. However, TestEditLog, TestStorageRestore, TestFileJournalManager, TestCheckpoint and TestBackupNode all pass cleanly.

          Show
          Ivan Kelly added a comment - I've updated this patch to make the recovery step explicit. Also, I've removed the CorruptionException. From the point of view of FSEditLog it doesn't matter if a JournalManager has gaps as long and it can get enough transactions to reach the max seen txid as stored in max_seen. I've not run the commit test on this because I need to figure out how to do that with mvn. However, TestEditLog, TestStorageRestore, TestFileJournalManager, TestCheckpoint and TestBackupNode all pass cleanly.
          Hide
          Todd Lipcon added a comment -

          Yes, I think that path sounds good. There were a few style nits, etc, with the original patch as well, but I'll let you make a patch rev before reviewing.

          Show
          Todd Lipcon added a comment - Yes, I think that path sounds good. There were a few style nits, etc, with the original patch as well, but I'll let you make a patch rev before reviewing.
          Hide
          Jitendra Nath Pandey added a comment -

          > 3. Todd raised some concerns about removing the caching introduced in HDFS-2018, Sanjay and Jitendra are going to
          > take a look and comment.
          I discussed with Sanjay and Suresh regarding the caching of edit log file names in FileJournalManager. We agreed that the cache is not really needed in the FileJournalManager, because there is not any significant performance concern. The cache for list of valid editlog files in FileJournalManager is used only for recovery for a read. I think this cache can be removed once explicit recoverLogs API is introduced. This should address Todd's concern.

          So the following should let us reach an agreeable patch:
          1. Modify Ivan's patch to introduce recoverLogs API.
          2. The point (1) above will let us remove the cache from FileJournalManager.
          3. As agreed before getEditLogManifest should be moved to JournalAdminInterface, but we can leave it for later as mentioned in Eli's comment, so no changes for this in HDFS-2018.

          Show
          Jitendra Nath Pandey added a comment - > 3. Todd raised some concerns about removing the caching introduced in HDFS-2018 , Sanjay and Jitendra are going to > take a look and comment. I discussed with Sanjay and Suresh regarding the caching of edit log file names in FileJournalManager. We agreed that the cache is not really needed in the FileJournalManager, because there is not any significant performance concern. The cache for list of valid editlog files in FileJournalManager is used only for recovery for a read. I think this cache can be removed once explicit recoverLogs API is introduced. This should address Todd's concern. So the following should let us reach an agreeable patch: 1. Modify Ivan's patch to introduce recoverLogs API. 2. The point (1) above will let us remove the cache from FileJournalManager. 3. As agreed before getEditLogManifest should be moved to JournalAdminInterface, but we can leave it for later as mentioned in Eli's comment, so no changes for this in HDFS-2018 .
          Hide
          Eli Collins added a comment -

          Here are the notes from the call we had today:

          1. We discussed whether to have an explicit recoverLogs API in JournalManager. Todd explained the motivation for the API - shared storage the standy wants to call getInputStream w/o necesarily initiating recovery, so the explicit API is needed.
          2. We discussed whether/how to expose the segments. Jitendra discussed one option: hiding the segments via having a JournalAdminInterface with getEditLogManifest method that manages transfer from primary to the standby and checkpointer. The JournalManager and the checkpointer both implement this interface. Todd discussed an intermediate step, using the tuples of log segments and delaying JournalAdminInterface to 1580. Sanjay proposed trying to get rid of segments in 1580 and doing that incrementally in a separate jira, people thought that was reasonable.
          3. Todd raised some concerns about removing the caching introduced in HDFS-2018, Sanjay and Jitendra are going to take a look and comment.
          Show
          Eli Collins added a comment - Here are the notes from the call we had today: We discussed whether to have an explicit recoverLogs API in JournalManager. Todd explained the motivation for the API - shared storage the standy wants to call getInputStream w/o necesarily initiating recovery, so the explicit API is needed. We discussed whether/how to expose the segments. Jitendra discussed one option: hiding the segments via having a JournalAdminInterface with getEditLogManifest method that manages transfer from primary to the standby and checkpointer. The JournalManager and the checkpointer both implement this interface. Todd discussed an intermediate step, using the tuples of log segments and delaying JournalAdminInterface to 1580. Sanjay proposed trying to get rid of segments in 1580 and doing that incrementally in a separate jira, people thought that was reasonable. Todd raised some concerns about removing the caching introduced in HDFS-2018 , Sanjay and Jitendra are going to take a look and comment.
          Hide
          Todd Lipcon added a comment -

          Having the 2NN use an EditLogInputStream interface to fetch edits seems a little "off" since EditLogInputStream only passes along logical operations and not bytes. So, the 2NN will be deserializing all the transactions. I guess in some ways it is more elegant, though. Do you foresee the extra deserialize/serialize step during edits transfer to be expensive?

          If this has been implemented, could you point me to the branch where it is? As is, in the patches I could find, edits transfer is left alone and it breaks a lot of the abstractions that are being pushed for.

          Show
          Todd Lipcon added a comment - Having the 2NN use an EditLogInputStream interface to fetch edits seems a little "off" since EditLogInputStream only passes along logical operations and not bytes. So, the 2NN will be deserializing all the transactions. I guess in some ways it is more elegant, though. Do you foresee the extra deserialize/serialize step during edits transfer to be expensive? If this has been implemented, could you point me to the branch where it is? As is, in the patches I could find, edits transfer is left alone and it breaks a lot of the abstractions that are being pushed for.
          Hide
          Sanjay Radia added a comment -

          In the case of BN the edits will roll in sync since the 'closeEdit" transaction is visible to the BN.
          For the checkpointer it desirable to have have them in sycn. The checkpointer can request a stream of edits following a tx id (the one beyond the fsimage checkpoint) and similarly roll on each "closeEdit" transaction. The CN will do that anyway since it is essentially the BN code. The 2NN should request a stream and can roll on each "closeEdit" transaction.

          The JournalInterface can provide a "remote impl" that gets the edits from the NN.

          Show
          Sanjay Radia added a comment - In the case of BN the edits will roll in sync since the 'closeEdit" transaction is visible to the BN. For the checkpointer it desirable to have have them in sycn. The checkpointer can request a stream of edits following a tx id (the one beyond the fsimage checkpoint) and similarly roll on each "closeEdit" transaction. The CN will do that anyway since it is essentially the BN code. The 2NN should request a stream and can roll on each "closeEdit" transaction. The JournalInterface can provide a "remote impl" that gets the edits from the NN.
          Hide
          Todd Lipcon added a comment -

          Just wanted to write some thoughts on this before the call tomorrow:

          • It's unclear how the "segmentless" design started in this JIRA and continued in HDFS-1580 deals with edits transfer. Several places above Jitendra and Ivan have referred to some plan to remove edits transfer and instead use rsync or scp - that's certainly an option, but not one that's been decided in public. I'd like to understand better how we plan to implement edits transfer with this and HDFS-1580. I brought this up several times in HDFS-1580's comments, but most of the answers I've seen have been "BookKeeper doesn't need edits transfer" - that's fine, but we need the existing file based setup to continue to work.

          If the plan is to go entirely segmentless, then I think we should go whole-hog with it and completely abandon the prior invariant that the storage directories on different nodes will always roll together. I found this invariant really nice operationally, but if rolling is meant to be pushed down to an implementation detail of the specific journal, then I don't see how we can enforce the invariant.

          By going whole-hog, I mean that the "edits fetching" in the 2NN/CN/BN would be rewritten to simply ask for a stream of edits, and it would have no idea the boundaries of the different files.

          In the current patches proposed here, HDFS-1580, and HDFS-2158, the edits fetching (getEditsLogManifest) is left as a kind of strange second code path that jumps around the otherwise nice abstractions.

          It seems to me that one of the primary objections raised here is that there are already a bunch of other patches queued up on top of this one. Do you have a github branch available that would help others see where it's going? eg the latest patch on HDFS-1580 is 10 weeks old.

          Show
          Todd Lipcon added a comment - Just wanted to write some thoughts on this before the call tomorrow: It's unclear how the "segmentless" design started in this JIRA and continued in HDFS-1580 deals with edits transfer. Several places above Jitendra and Ivan have referred to some plan to remove edits transfer and instead use rsync or scp - that's certainly an option, but not one that's been decided in public. I'd like to understand better how we plan to implement edits transfer with this and HDFS-1580 . I brought this up several times in HDFS-1580 's comments, but most of the answers I've seen have been "BookKeeper doesn't need edits transfer" - that's fine, but we need the existing file based setup to continue to work. If the plan is to go entirely segmentless, then I think we should go whole-hog with it and completely abandon the prior invariant that the storage directories on different nodes will always roll together. I found this invariant really nice operationally, but if rolling is meant to be pushed down to an implementation detail of the specific journal, then I don't see how we can enforce the invariant. By going whole-hog, I mean that the "edits fetching" in the 2NN/CN/BN would be rewritten to simply ask for a stream of edits, and it would have no idea the boundaries of the different files. In the current patches proposed here, HDFS-1580 , and HDFS-2158 , the edits fetching (getEditsLogManifest) is left as a kind of strange second code path that jumps around the otherwise nice abstractions. It seems to me that one of the primary objections raised here is that there are already a bunch of other patches queued up on top of this one. Do you have a github branch available that would help others see where it's going? eg the latest patch on HDFS-1580 is 10 weeks old.
          Hide
          Eli Collins added a comment -

          Sanjay, Todd, Jitendra, Suresh and myself (Ivan is on vacation) are planning on discussing this on the phone next week, Wednesday 2pm PST. Anyone is welcome to join. The conf # is +1 (805) 309-2350 and the conf ID is 745-745. We'll post notes to the jira. Can continue to discuss on jira in the meantime of course.

          Show
          Eli Collins added a comment - Sanjay, Todd, Jitendra, Suresh and myself (Ivan is on vacation) are planning on discussing this on the phone next week, Wednesday 2pm PST. Anyone is welcome to join. The conf # is +1 (805) 309-2350 and the conf ID is 745-745. We'll post notes to the jira. Can continue to discuss on jira in the meantime of course.
          Hide
          Sanjay Radia added a comment -

          HDFS-1073 has a sequence of patches that move us towards the final solution:

          • Todd's changes added a concept called loadPlan plus segments that were visible that I did not like.
          • HDFS-2018 makes improvements - removes loadPlan but we are debating [arts of it
          • HDFS-1580 further cleans this up and hides the edit log segments inside the journal interface.

          We need to look at the above as set of improvements and decide on whether we like the final solution.
          I let 1073 get merged in because the final solution was acceptable.

          Todd, is the final solution after HDFS-1580 acceptable to you? Looks like you want to expose segments based on some of your comments in this Jira. 1s that indeed the main area of disagreement?

          Show
          Sanjay Radia added a comment - HDFS-1073 has a sequence of patches that move us towards the final solution: Todd's changes added a concept called loadPlan plus segments that were visible that I did not like. HDFS-2018 makes improvements - removes loadPlan but we are debating [arts of it HDFS-1580 further cleans this up and hides the edit log segments inside the journal interface. We need to look at the above as set of improvements and decide on whether we like the final solution. I let 1073 get merged in because the final solution was acceptable. Todd, is the final solution after HDFS-1580 acceptable to you? Looks like you want to expose segments based on some of your comments in this Jira. 1s that indeed the main area of disagreement?
          Hide
          Jitendra Nath Pandey added a comment -

          Todd,
          I think there are two fundamental differences in the two patches.

          1) EditLogReference or TransactionRanges in the API
          >I found the code to implement this "merge distinct sets of ranges to cover the target range" algorithm a lot easier to understand when it's dealing
          > with the full List<EditLogReference>, rather than having to keeping going back to each JournalManager in turn as it builds up a list.
          In HDFS-2158, I intend to simplify selectInputStream. JournalSet will hide away all the journals and FSEditLog will just know one journal. There is a patch uploaded on 2158. It is a bit old but will give you the idea. Will that address your concern?

          2) Ivan's patch attempts to hide log recovery from FSEditLog, however your patch makes it explicit.
          Can this be addressed in a separate jira, in the context of HA, because we run into the case of concurrent reader/writer only in the context of HA?

          Show
          Jitendra Nath Pandey added a comment - Todd, I think there are two fundamental differences in the two patches. 1) EditLogReference or TransactionRanges in the API >I found the code to implement this "merge distinct sets of ranges to cover the target range" algorithm a lot easier to understand when it's dealing > with the full List<EditLogReference>, rather than having to keeping going back to each JournalManager in turn as it builds up a list. In HDFS-2158 , I intend to simplify selectInputStream. JournalSet will hide away all the journals and FSEditLog will just know one journal. There is a patch uploaded on 2158. It is a bit old but will give you the idea. Will that address your concern? 2) Ivan's patch attempts to hide log recovery from FSEditLog, however your patch makes it explicit. Can this be addressed in a separate jira, in the context of HA, because we run into the case of concurrent reader/writer only in the context of HA?
          Hide
          Todd Lipcon added a comment -

          The patch adds EditLogReference as a new concept which actually complicates the API. Because now, in FSEditLog and FSImage, where JournalManager is being used, the code needs to use the new interface.
          Ivan's patch uses EditLogInputStream which has been in use already.

          The issue with using EditLogInputStream is that this interface represents an opened stream. Imagine if listFiles() on a directory returned a List<InputStream> – that would be obviously a bad API. So I feel the same way about selectInputStreams – you should be able to enumerate which streams you plan to open before going and opening them.

          Doing this also allows getRemoteEditLogs to share code with the startup – it's obvious that getRemoteEditLogs needs to list streams without actually opening them. So might as well reuse that same interface - the JournalManager just needs to expose its list of available log segments, and FSEditLog is responsible for using that either for loading or for edits transfer.

          EditLogReference actually leaks the idea of segments in different parts of the code, and with another name.

          I'm happy to rename this to TransactionRange or something instead. Regardless of the backend storage, we need to think about each journal manager providing a set of ranges of transaction IDs. Then, the algorithm for startup is to simply take the various ranges and merge them to make a complete range.

          I found the code to implement this "merge distinct sets of ranges to cover the target range" algorithm a lot easier to understand when it's dealing with the full List<EditLogReference>, rather than having to keeping going back to each JournalManager in turn as it builds up a list.

          The recoverUnclosedLogs should not be implemented by a journal. A simpler JournalManager interface is better, where it just knows what is in progress and what is finalized.

          The issue is that we sometimes need to read from an in-progress log without finalizing it: for example in the standby master in an HA setup. If the getInputStream API finalizes the log, as you have it, then this becomes impossible, since we'll end up finalizing a log under the active writer. Like you said earlier, there are distinct read operations and write operations – getInputStream is a read operation. recoverLogs is a write operation. Therefore only the active master should call recoverLogs, and a standby using getInputStream should not have any mutative effect on the underlying store.

          The JournalManager must know whether it is a writer or not to verify whether it can create OutputStreams or not. Since we already have this information, we can use it to inform our decision whether to finalized recovered in progress logs or not

          I'm not sure why the JournalManager needs to have this information. FSEditLog is the one responsible for coordinating the lifecycle (eg opening logs for writing or reading). So, FSEditLog has the state, and can use it in one single place to determine whether to ask the JournalManagers to recover unclosed logs.

          Similarly, FSEditLog can use this API whenever it detects that a JournalManager which previously had failed has "come back" – that's a good time to finalize whatever logs might have been truncated by the previous failure, in the future.

          The new approach also retains getInProgressInputStream() which is only used in a special case in BackupNode. We shouldn't have methods in the JournalStream API only for corner cases like this. My patch gets rid of this. It should be possible to do the same with the other approach

          Not sure it's really a "special case" in the BackupNode - it's the only way the BN can get back in sync with the primary. We will also use it in the StandbyNode, I imagine - in both cases we need to be able to read from some logs that may be in the process of being written, in order to stay in sync.

          getNumberOfTransactions could be replaced with a getTransactionRanges call. The editlog could then select the list of transaction ranges it wants to load. It then calls getInputStream() on the JournalManagers to get a stream starting with the firstTxId of the range. I had considered an approach vaguely similar before, but didn't implement it.

          Yes, this is essentially what getEditLogs is and what selectLogsToRead is doing. I'm happy to rename EditLogReference to TransactionRange and getEditLogs to getReadableTransactionRanges.

          Removing the openForEdit() call makes it equivalent to RemoteEditLog, so RemoteEditLog should be removed

          RemoteEditLog is a "wire type" for transfer to the 2NN and other checkpointers. Keeping the protocol types distinct from the internal types is a best practice people have been encouraging recently.

          Show
          Todd Lipcon added a comment - The patch adds EditLogReference as a new concept which actually complicates the API. Because now, in FSEditLog and FSImage, where JournalManager is being used, the code needs to use the new interface. Ivan's patch uses EditLogInputStream which has been in use already. The issue with using EditLogInputStream is that this interface represents an opened stream. Imagine if listFiles() on a directory returned a List<InputStream> – that would be obviously a bad API. So I feel the same way about selectInputStreams – you should be able to enumerate which streams you plan to open before going and opening them. Doing this also allows getRemoteEditLogs to share code with the startup – it's obvious that getRemoteEditLogs needs to list streams without actually opening them. So might as well reuse that same interface - the JournalManager just needs to expose its list of available log segments, and FSEditLog is responsible for using that either for loading or for edits transfer. EditLogReference actually leaks the idea of segments in different parts of the code, and with another name. I'm happy to rename this to TransactionRange or something instead. Regardless of the backend storage, we need to think about each journal manager providing a set of ranges of transaction IDs. Then, the algorithm for startup is to simply take the various ranges and merge them to make a complete range. I found the code to implement this "merge distinct sets of ranges to cover the target range" algorithm a lot easier to understand when it's dealing with the full List<EditLogReference>, rather than having to keeping going back to each JournalManager in turn as it builds up a list. The recoverUnclosedLogs should not be implemented by a journal. A simpler JournalManager interface is better, where it just knows what is in progress and what is finalized. The issue is that we sometimes need to read from an in-progress log without finalizing it: for example in the standby master in an HA setup. If the getInputStream API finalizes the log, as you have it, then this becomes impossible, since we'll end up finalizing a log under the active writer. Like you said earlier, there are distinct read operations and write operations – getInputStream is a read operation. recoverLogs is a write operation. Therefore only the active master should call recoverLogs , and a standby using getInputStream should not have any mutative effect on the underlying store. The JournalManager must know whether it is a writer or not to verify whether it can create OutputStreams or not. Since we already have this information, we can use it to inform our decision whether to finalized recovered in progress logs or not I'm not sure why the JournalManager needs to have this information. FSEditLog is the one responsible for coordinating the lifecycle (eg opening logs for writing or reading). So, FSEditLog has the state, and can use it in one single place to determine whether to ask the JournalManagers to recover unclosed logs. Similarly, FSEditLog can use this API whenever it detects that a JournalManager which previously had failed has "come back" – that's a good time to finalize whatever logs might have been truncated by the previous failure, in the future. The new approach also retains getInProgressInputStream() which is only used in a special case in BackupNode. We shouldn't have methods in the JournalStream API only for corner cases like this. My patch gets rid of this. It should be possible to do the same with the other approach Not sure it's really a "special case" in the BackupNode - it's the only way the BN can get back in sync with the primary. We will also use it in the StandbyNode, I imagine - in both cases we need to be able to read from some logs that may be in the process of being written, in order to stay in sync. getNumberOfTransactions could be replaced with a getTransactionRanges call. The editlog could then select the list of transaction ranges it wants to load. It then calls getInputStream() on the JournalManagers to get a stream starting with the firstTxId of the range. I had considered an approach vaguely similar before, but didn't implement it. Yes, this is essentially what getEditLogs is and what selectLogsToRead is doing. I'm happy to rename EditLogReference to TransactionRange and getEditLogs to getReadableTransactionRanges . Removing the openForEdit() call makes it equivalent to RemoteEditLog, so RemoteEditLog should be removed RemoteEditLog is a "wire type" for transfer to the 2NN and other checkpointers. Keeping the protocol types distinct from the internal types is a best practice people have been encouraging recently.
          Hide
          Ivan Kelly added a comment -

          No, fencing is to keep two NNs from writing at the same time to a storage directory. This is about one NN reading while the other writes – in your design, the reader ends up finalizing log segments without explicitly saying "do recovery". Recovery is an operation that should only be done by the active (writer). So making it an explicit API makes more sense.

          What is recovery but a write operation? I disagree that an explicit API makes more sense. The JournalManager must know whether it is a writer or not to verify whether it can create OutputStreams or not. Since we already have this information, we can use it to inform our decision whether to finalized recovered in progress logs or not.

          The new approach also retains getInProgressInputStream() which is only used in a special case in BackupNode. We shouldn't have methods in the JournalStream API only for corner cases like this. My patch gets rid of this. It should be possible to do the same with the other approach.

          I agree with Jitendra about the way EditLogReference is used. Streams should be the unit of currency used in FSImage at least. FSEditLog should return a list of streams from which the data can be read. getNumberOfTransactions could be replaced with a getTransactionRanges call. The editlog could then select the list of transaction ranges it wants to load. It then calls getInputStream() on the JournalManagers to get a stream starting with the firstTxId of the range. I had considered an approach vaguely similar before, but didn't implement it.

          Also, the name EditLogReference is meaningless. It should be called TransactionRange or similar. Removing the openForEdit() call makes it equivalent to RemoteEditLog, so RemoteEditLog should be removed.

          public interface JournalManager {
              List<TransactionRange> getTransactionRanges(long fromTxId);
              EditLogInputStream getInputStream(long fromTxId);
          }
          
          public class FSEditLog {
              List<EditLogInputStream> selectInputStreams(long fromTxId) {
                 List<EditLogInputStream> streams = Lists.newArrayList();
                 MultiMap<TransactionRange,JournalManager> rangeMap = new MultiMap();
                 MultiMap<Long,TransactionRange> rangesByStartTx = new MultiMap();
          
                 for (JournalManager j : journals) {
                    List<TransactionRange> ranges = j.getTransactionRanges(fromTxId);
          	    for (TransactionRange r : ranges) {
          	       rangeMap.add(r, j);
                       rangesByStartTx.add(r.getFirstTxId(), r);
                    }
                 }
          
                 while (true) {
                    ImmutableList<TransactionRange> ranges = rangesByStartTxId.get(fromTxId);
          	    if (ranges.isEmpty()) {
                      break;
                    }
          	    TransactionRange bestRange = Collections.max(range, TransactionRange.COMPARATOR);
          	    assert bestRange.getLastTxId() >= curStartTxId;
          	    streams.add(rangeMap.get(bestRange).getInputStream(bestRange.getFirstTxId()));
          	    fromTxId = bestLog.getLastTxId() + 1;
          	 }
          	 return streams;
              }
          }
          

          Im on holiday until the 29th from today, which is why I was eager to get this resolved quickly, as it has been holding up Jitendra. I won't be able to work on anything until I get back, so if one of you guys do continue forward, and if you choose the different approach, I'd like to see the following changes.

          • Recovery should not be explicit in the API
          • get rid of getInProgressInputStream
          • Rename EditLogReference to TransactionRange, remove openForEdit & merge with RemoteEditLog.
          Show
          Ivan Kelly added a comment - No, fencing is to keep two NNs from writing at the same time to a storage directory. This is about one NN reading while the other writes – in your design, the reader ends up finalizing log segments without explicitly saying "do recovery". Recovery is an operation that should only be done by the active (writer). So making it an explicit API makes more sense. What is recovery but a write operation? I disagree that an explicit API makes more sense. The JournalManager must know whether it is a writer or not to verify whether it can create OutputStreams or not. Since we already have this information, we can use it to inform our decision whether to finalized recovered in progress logs or not. The new approach also retains getInProgressInputStream() which is only used in a special case in BackupNode. We shouldn't have methods in the JournalStream API only for corner cases like this. My patch gets rid of this. It should be possible to do the same with the other approach. I agree with Jitendra about the way EditLogReference is used. Streams should be the unit of currency used in FSImage at least. FSEditLog should return a list of streams from which the data can be read. getNumberOfTransactions could be replaced with a getTransactionRanges call. The editlog could then select the list of transaction ranges it wants to load. It then calls getInputStream() on the JournalManagers to get a stream starting with the firstTxId of the range. I had considered an approach vaguely similar before, but didn't implement it. Also, the name EditLogReference is meaningless. It should be called TransactionRange or similar. Removing the openForEdit() call makes it equivalent to RemoteEditLog, so RemoteEditLog should be removed. public interface JournalManager { List<TransactionRange> getTransactionRanges( long fromTxId); EditLogInputStream getInputStream( long fromTxId); } public class FSEditLog { List<EditLogInputStream> selectInputStreams( long fromTxId) { List<EditLogInputStream> streams = Lists.newArrayList(); MultiMap<TransactionRange,JournalManager> rangeMap = new MultiMap(); MultiMap< Long ,TransactionRange> rangesByStartTx = new MultiMap(); for (JournalManager j : journals) { List<TransactionRange> ranges = j.getTransactionRanges(fromTxId); for (TransactionRange r : ranges) { rangeMap.add(r, j); rangesByStartTx.add(r.getFirstTxId(), r); } } while ( true ) { ImmutableList<TransactionRange> ranges = rangesByStartTxId.get(fromTxId); if (ranges.isEmpty()) { break ; } TransactionRange bestRange = Collections.max(range, TransactionRange.COMPARATOR); assert bestRange.getLastTxId() >= curStartTxId; streams.add(rangeMap.get(bestRange).getInputStream(bestRange.getFirstTxId())); fromTxId = bestLog.getLastTxId() + 1; } return streams; } } Im on holiday until the 29th from today, which is why I was eager to get this resolved quickly, as it has been holding up Jitendra. I won't be able to work on anything until I get back, so if one of you guys do continue forward, and if you choose the different approach, I'd like to see the following changes. Recovery should not be explicit in the API get rid of getInProgressInputStream Rename EditLogReference to TransactionRange, remove openForEdit & merge with RemoteEditLog.
          Hide
          Jitendra Nath Pandey added a comment -

          >I think it's a little easier to understand these APIs.
          I don't think so. The patch adds EditLogReference as a new concept which actually complicates the API. Because now, in FSEditLog and FSImage, where JournalManager is being used, the code needs to use the new interface.
          Ivan's patch uses EditLogInputStream which has been in use already.

          In HDFS-1580, we discussed to move towards modeling journals as a single continuous stream of transactions, instead of segments. This patch doesn't remove segments but we don't want to proliferate the use of segments. EditLogReference actually leaks the idea of segments in different parts of the code, and with another name.

          > treats log recovery as an explicit step at startup - it's good to make it explicit since we need to not do recovery when a NN starts up in standby mode, for example.
          The recoverUnclosedLogs should not be implemented by a journal. A simpler JournalManager interface is better, where it just knows what is in progress and what is finalized.

          > "edits transfer mechanism"
          I think we should leave that outside the scope of this jira, because we are not even sure whether it will be needed in future as pointed out by Ivan.

          > I'm offering to do the work of modifying this to something I find more acceptable.
          It is important to identify and discuss the objections against the original patch. If you upload a new patch with fundamental changes, it is hard for others to guess what were your objections with the original patch.

          Show
          Jitendra Nath Pandey added a comment - >I think it's a little easier to understand these APIs. I don't think so. The patch adds EditLogReference as a new concept which actually complicates the API. Because now, in FSEditLog and FSImage, where JournalManager is being used, the code needs to use the new interface. Ivan's patch uses EditLogInputStream which has been in use already. In HDFS-1580 , we discussed to move towards modeling journals as a single continuous stream of transactions, instead of segments. This patch doesn't remove segments but we don't want to proliferate the use of segments. EditLogReference actually leaks the idea of segments in different parts of the code, and with another name. > treats log recovery as an explicit step at startup - it's good to make it explicit since we need to not do recovery when a NN starts up in standby mode, for example. The recoverUnclosedLogs should not be implemented by a journal. A simpler JournalManager interface is better, where it just knows what is in progress and what is finalized. > "edits transfer mechanism" I think we should leave that outside the scope of this jira, because we are not even sure whether it will be needed in future as pointed out by Ivan. > I'm offering to do the work of modifying this to something I find more acceptable. It is important to identify and discuss the objections against the original patch. If you upload a new patch with fundamental changes, it is hard for others to guess what were your objections with the original patch.
          Hide
          Todd Lipcon added a comment -

          This patch passes all the tests, has javadoc, etc.

          Show
          Todd Lipcon added a comment - This patch passes all the tests, has javadoc, etc.
          Hide
          Todd Lipcon added a comment -

          (btw, some of the changes I made actually go back to things you had in your original patch. For example, the recoverUnclosedStreams API instead of doing lazy recovery-on-read. As you said over in HDFS-1580:

          Finalizing in getNumTransactions is a bit messy, because its difficult to tell if a inprogress file is actually inprogress or the result of a crashed namenode. I tried it in HDFS-2018, but ended up creating an explicit recoverUnclosedStreams() call

          Show
          Todd Lipcon added a comment - (btw, some of the changes I made actually go back to things you had in your original patch. For example, the recoverUnclosedStreams API instead of doing lazy recovery-on-read. As you said over in HDFS-1580 : Finalizing in getNumTransactions is a bit messy, because its difficult to tell if a inprogress file is actually inprogress or the result of a crashed namenode. I tried it in HDFS-2018 , but ended up creating an explicit recoverUnclosedStreams() call
          Hide
          Todd Lipcon added a comment -

          The caching in FileJournalManager is only for inprogress logs, which will be recovered the first time they are read while not being written to. This is the only caching. I don't think scanning a directory during load time is going to cause a significant performance hit.

          It's not a performance thing - I'm just pointing out that it's a bit ugly and harder to understand this way. The validation/recovery at read time is also quite messy IMO - an API that purports to "select a stream to read" should not have a side effect of renaming it, etc.

          Proper fencing is the correct way to handle this.

          No, fencing is to keep two NNs from writing at the same time to a storage directory. This is about one NN reading while the other writes – in your design, the reader ends up finalizing log segments without explicitly saying "do recovery". Recovery is an operation that should only be done by the active (writer). So making it an explicit API makes more sense.

          Other journal types shouldn't need any edit transfer mechanism, as anything that isn't a local file will be shared storage.

          In the case of shared storage, the "edits transfer mechanism" is for the NN to return a pointer to the file in its shared location (eg a pointer to the BK ledger, the file on NFS, whatever). Having the NN provide this manifest to any standbys is still helpful, and makes files less of a special case. I'm not proposing to attack this right now, but it does improve the current code where we have "instanceof" checks.

          This patch has been held up for 6 weeks for various reasons, and changing the approach now is just going to delay it again

          It's a 100KB+ patch that both refactors and changes a lot of implementation. We separated it into smaller patches, and those smaller patches have made steady progress over these 6 weeks. I'm offering to do the work of modifying this to something I find more acceptable.

          It would have been nice if this alternative approach had been proposed a month ago

          I believe I did express my concern about these APIs both here and on HDFS-1580. The size of the patch made it difficult to express exactly what the issues would be until the other refactors had gotten done.

          I well appreciate the frustration of having a patch take a long time to go in, but I'm trying to work with you here...

          Show
          Todd Lipcon added a comment - The caching in FileJournalManager is only for inprogress logs, which will be recovered the first time they are read while not being written to. This is the only caching. I don't think scanning a directory during load time is going to cause a significant performance hit. It's not a performance thing - I'm just pointing out that it's a bit ugly and harder to understand this way. The validation/recovery at read time is also quite messy IMO - an API that purports to "select a stream to read" should not have a side effect of renaming it, etc. Proper fencing is the correct way to handle this. No, fencing is to keep two NNs from writing at the same time to a storage directory. This is about one NN reading while the other writes – in your design, the reader ends up finalizing log segments without explicitly saying "do recovery". Recovery is an operation that should only be done by the active (writer). So making it an explicit API makes more sense. Other journal types shouldn't need any edit transfer mechanism, as anything that isn't a local file will be shared storage. In the case of shared storage, the "edits transfer mechanism" is for the NN to return a pointer to the file in its shared location (eg a pointer to the BK ledger, the file on NFS, whatever). Having the NN provide this manifest to any standbys is still helpful, and makes files less of a special case. I'm not proposing to attack this right now, but it does improve the current code where we have "instanceof" checks. This patch has been held up for 6 weeks for various reasons, and changing the approach now is just going to delay it again It's a 100KB+ patch that both refactors and changes a lot of implementation. We separated it into smaller patches, and those smaller patches have made steady progress over these 6 weeks. I'm offering to do the work of modifying this to something I find more acceptable. It would have been nice if this alternative approach had been proposed a month ago I believe I did express my concern about these APIs both here and on HDFS-1580 . The size of the patch made it difficult to express exactly what the issues would be until the other refactors had gotten done. I well appreciate the frustration of having a patch take a long time to go in, but I'm trying to work with you here...
          Hide
          Ivan Kelly added a comment -

          no need for the more complicated caching in FileJournalManager, since we only scan the directory once

          The caching in FileJournalManager is only for inprogress logs, which will be recovered the first time they are read while not being written to. This is the only caching. I don't think scanning a directory during load time is going to cause a significant performance hit.

          treats log recovery as an explicit step at startup - it's good to make it explicit since we need to not do recovery when a NN starts up in standby mode, for example.

          Proper fencing is the correct way to handle this.

          the EditLogReference interface will also make it easier to allow other types of journal managers to participate in edits-transfer, I think.

          Other journal types shouldn't need any edit transfer mechanism, as anything that isn't a local file will be shared storage.

          Frankly, I don't see that this alternative approach brings enough to justify holding off this for longer. This patch has been held up for 6 weeks for various reasons, and changing the approach now is just going to delay it again. Jitendra and I have other patches which have been held up waiting for this. It would have been nice if this alternative approach had been proposed a month ago.

          Show
          Ivan Kelly added a comment - no need for the more complicated caching in FileJournalManager, since we only scan the directory once The caching in FileJournalManager is only for inprogress logs, which will be recovered the first time they are read while not being written to. This is the only caching. I don't think scanning a directory during load time is going to cause a significant performance hit. treats log recovery as an explicit step at startup - it's good to make it explicit since we need to not do recovery when a NN starts up in standby mode, for example. Proper fencing is the correct way to handle this. the EditLogReference interface will also make it easier to allow other types of journal managers to participate in edits-transfer, I think. Other journal types shouldn't need any edit transfer mechanism, as anything that isn't a local file will be shared storage. Frankly, I don't see that this alternative approach brings enough to justify holding off this for longer. This patch has been held up for 6 weeks for various reasons, and changing the approach now is just going to delay it again. Jitendra and I have other patches which have been held up waiting for this. It would have been nice if this alternative approach had been proposed a month ago.
          Hide
          Todd Lipcon added a comment -

          Here's a rough cut of the way I'm thinking about this. I didn't update TestFileJournalManager, but the other tests compile and pass (TestBackupNode, TestCheckpoint, TestEditLog).

          I think it's a little easier to understand these APIs, and it's ~100 fewer lines of code.

          Some of the advantages IMO:

          • no need for the more complicated caching in FileJournalManager, since we only scan the directory once
          • treats log recovery as an explicit step at startup - it's good to make it explicit since we need to not do recovery when a NN starts up in standby mode, for example.
          • the EditLogReference interface will also make it easier to allow other types of journal managers to participate in edits-transfer, I think.
          Show
          Todd Lipcon added a comment - Here's a rough cut of the way I'm thinking about this. I didn't update TestFileJournalManager, but the other tests compile and pass (TestBackupNode, TestCheckpoint, TestEditLog). I think it's a little easier to understand these APIs, and it's ~100 fewer lines of code. Some of the advantages IMO: no need for the more complicated caching in FileJournalManager, since we only scan the directory once treats log recovery as an explicit step at startup - it's good to make it explicit since we need to not do recovery when a NN starts up in standby mode, for example. the EditLogReference interface will also make it easier to allow other types of journal managers to participate in edits-transfer, I think.
          Hide
          Todd Lipcon added a comment -

          right, but when loading a sequence of logs, it will call selectStream() for each starting txid in the sequence. Then, the selectStream() call causes each journal to have to look at its contents. Then, FileJournalManager has a lot of caching in it to try to make this more efficient, from what I can tell. I'm about halfway through the alternate approach - I think it will end up being fewer lines of code and a bit simpler.

          Show
          Todd Lipcon added a comment - right, but when loading a sequence of logs, it will call selectStream() for each starting txid in the sequence. Then, the selectStream() call causes each journal to have to look at its contents. Then, FileJournalManager has a lot of caching in it to try to make this more efficient, from what I can tell. I'm about halfway through the alternate approach - I think it will end up being fewer lines of code and a bit simpler.
          Hide
          Jitendra Nath Pandey added a comment -

          >The APIs here feel a little strange, how FSEditLog has to keep going back to JournalManager to find the next stream, and
          >then JournalManager is calling "listFiles" over and over again.
          I didn't get the concern here. selectStream calls getNumberOfTransactions on each journal and then chooses the best one. Am I looking at different part of the code from what you are referring to?

          Show
          Jitendra Nath Pandey added a comment - >The APIs here feel a little strange, how FSEditLog has to keep going back to JournalManager to find the next stream, and >then JournalManager is calling "listFiles" over and over again. I didn't get the concern here. selectStream calls getNumberOfTransactions on each journal and then chooses the best one. Am I looking at different part of the code from what you are referring to?
          Hide
          Todd Lipcon added a comment -

          The APIs here feel a little strange, how FSEditLog has to keep going back to JournalManager to find the next stream, and then JournalManager is calling "listFiles" over and over again.

          Let me take a whack at a slightly different API and upload it here for comparison – my idea is to first iterate over the JournalManagers, and have them return List<EditLogReference> (where EditLogFile implements EditLogReference), and then come up with a "plan" as before. Then it can iterate over the EditLogReferences and call open() on each to obtain an EditLogFileInputStream... or something of that nature

          Show
          Todd Lipcon added a comment - The APIs here feel a little strange, how FSEditLog has to keep going back to JournalManager to find the next stream, and then JournalManager is calling "listFiles" over and over again. Let me take a whack at a slightly different API and upload it here for comparison – my idea is to first iterate over the JournalManagers, and have them return List<EditLogReference> (where EditLogFile implements EditLogReference), and then come up with a "plan" as before. Then it can iterate over the EditLogReferences and call open() on each to obtain an EditLogFileInputStream... or something of that nature
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12490157/HDFS-2018.diff
          against trunk revision 1156490.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 28 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these core unit tests:
          org.apache.hadoop.hdfs.server.blockmanagement.TestReplicationPolicy
          org.apache.hadoop.hdfs.server.namenode.TestCheckpoint
          org.apache.hadoop.hdfs.server.namenode.TestNNThroughputBenchmark
          org.apache.hadoop.hdfs.server.namenode.TestValidateConfigurationSettings
          org.apache.hadoop.hdfs.TestHDFSServerPorts

          +1 contrib tests. The patch passed contrib unit tests.

          +1 system test framework. The patch passed system test framework compile.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/1089//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/1089//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1089//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12490157/HDFS-2018.diff against trunk revision 1156490. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 28 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: org.apache.hadoop.hdfs.server.blockmanagement.TestReplicationPolicy org.apache.hadoop.hdfs.server.namenode.TestCheckpoint org.apache.hadoop.hdfs.server.namenode.TestNNThroughputBenchmark org.apache.hadoop.hdfs.server.namenode.TestValidateConfigurationSettings org.apache.hadoop.hdfs.TestHDFSServerPorts +1 contrib tests. The patch passed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/1089//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/1089//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1089//console This message is automatically generated.
          Hide
          Ivan Kelly added a comment -

          Renamed CorruptionException to TransactionGapException.

          Show
          Ivan Kelly added a comment - Renamed CorruptionException to TransactionGapException.
          Hide
          Jitendra Nath Pandey added a comment -

          >Perhaps it would be better named TransactionGapException?
          I agree, CorruptionException is misleading.

          >getLastTxId is required by FSEditLog#selectInputStreams to make sure a continuous set of streams is selected. >getNumberOfTransactions wouldn't work here, because that counts the how many transactions are available on a journal manager >from that point, not how much is in the next segment.
          Now I see the point, getLastTxId is different from getNumberOfTransactions because we are reading a segment at a time.

          Show
          Jitendra Nath Pandey added a comment - >Perhaps it would be better named TransactionGapException? I agree, CorruptionException is misleading. >getLastTxId is required by FSEditLog#selectInputStreams to make sure a continuous set of streams is selected. >getNumberOfTransactions wouldn't work here, because that counts the how many transactions are available on a journal manager >from that point, not how much is in the next segment. Now I see the point, getLastTxId is different from getNumberOfTransactions because we are reading a segment at a time.
          Hide
          Ivan Kelly added a comment -

          The latest patch fixes the findbugs and javac warnings.

          Regarding CorruptionException, it is needed, as if we merely return 0 when we can't get a segment, and all JournalManagers return 0, FSEditLog has no way of knowing whether it has reached the end of the logs or whether a gap exists in all logs. Perhaps it would be better named TransactionGapException?

          Show
          Ivan Kelly added a comment - The latest patch fixes the findbugs and javac warnings. Regarding CorruptionException, it is needed, as if we merely return 0 when we can't get a segment, and all JournalManagers return 0, FSEditLog has no way of knowing whether it has reached the end of the logs or whether a gap exists in all logs. Perhaps it would be better named TransactionGapException?
          Hide
          Ivan Kelly added a comment -

          Fixed DFSRollback issue. Exception text has changed.

          Show
          Ivan Kelly added a comment - Fixed DFSRollback issue. Exception text has changed.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12489850/HDFS-2018.diff
          against trunk revision 1155998.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 25 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          -1 javac. The applied patch generated 36 javac compiler warnings (more than the trunk's current 35 warnings).

          -1 findbugs. The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these core unit tests:
          org.apache.hadoop.hdfs.server.blockmanagement.TestReplicationPolicy
          org.apache.hadoop.hdfs.server.datanode.TestBlockRecovery
          org.apache.hadoop.hdfs.server.datanode.TestDataDirs
          org.apache.hadoop.hdfs.server.namenode.TestCheckpoint
          org.apache.hadoop.hdfs.server.namenode.TestGetImageServlet
          org.apache.hadoop.hdfs.server.namenode.TestINodeFile
          org.apache.hadoop.hdfs.server.namenode.TestNNLeaseRecovery
          org.apache.hadoop.hdfs.server.namenode.TestNNThroughputBenchmark
          org.apache.hadoop.hdfs.server.namenode.TestValidateConfigurationSettings
          org.apache.hadoop.hdfs.TestDFSRollback
          org.apache.hadoop.hdfs.TestHDFSServerPorts

          +1 contrib tests. The patch passed contrib unit tests.

          +1 system test framework. The patch passed system test framework compile.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/1071//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/1071//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1071//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12489850/HDFS-2018.diff against trunk revision 1155998. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 25 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The applied patch generated 36 javac compiler warnings (more than the trunk's current 35 warnings). -1 findbugs. The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: org.apache.hadoop.hdfs.server.blockmanagement.TestReplicationPolicy org.apache.hadoop.hdfs.server.datanode.TestBlockRecovery org.apache.hadoop.hdfs.server.datanode.TestDataDirs org.apache.hadoop.hdfs.server.namenode.TestCheckpoint org.apache.hadoop.hdfs.server.namenode.TestGetImageServlet org.apache.hadoop.hdfs.server.namenode.TestINodeFile org.apache.hadoop.hdfs.server.namenode.TestNNLeaseRecovery org.apache.hadoop.hdfs.server.namenode.TestNNThroughputBenchmark org.apache.hadoop.hdfs.server.namenode.TestValidateConfigurationSettings org.apache.hadoop.hdfs.TestDFSRollback org.apache.hadoop.hdfs.TestHDFSServerPorts +1 contrib tests. The patch passed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/1071//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/1071//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1071//console This message is automatically generated.
          Hide
          Ivan Kelly added a comment -

          getFirstTxnId and getLastTxnId seem a bit redundant in the EditLogInputStream interface. The first txnid must be the last read plus one. The last txnid can be obtained using getNumberOfTransactions.
          Similarly in the constructor of EditLogFileInputStream. It might lead to inconsistent use of EditLogFileInputStream, for example if the file contents don't match the transaction ids being passed.

          getLastTxId is required by FSEditLog#selectInputStreams to make sure a continuous set of streams is selected. getNumberOfTransactions wouldn't work here, because that counts the how many transactions are available on a journal manager from that point, not how much is in the next segment.

          Take the scenario where you have to logs with txns A[[1,100][101,140][201,300]] & B[[1,100][101,200][201,240]]. A has had an error at txn 140 so that stream is incomplete. B has had an error at txn 240, so that stream is incomplete.

          Now if you used getNumberOfTransactions(101) for B, you get 140, and A you get 40. So, the stream from B is selected. But we can't read all 140, we must only read the next segment as we can't start reading half way through a segment(actually you can since HDFS-2187, but that was done after this and its still undesirable). Since we are selecting all the streams before starting to read them, we can't wait until we've read to the end of the stream to get last txid. So getLastTxId() is useful here.

          getFirstTxId() is very useful in BackupImage to find the current inprogress stream.

          Show
          Ivan Kelly added a comment - getFirstTxnId and getLastTxnId seem a bit redundant in the EditLogInputStream interface. The first txnid must be the last read plus one. The last txnid can be obtained using getNumberOfTransactions. Similarly in the constructor of EditLogFileInputStream. It might lead to inconsistent use of EditLogFileInputStream, for example if the file contents don't match the transaction ids being passed. getLastTxId is required by FSEditLog#selectInputStreams to make sure a continuous set of streams is selected. getNumberOfTransactions wouldn't work here, because that counts the how many transactions are available on a journal manager from that point, not how much is in the next segment. Take the scenario where you have to logs with txns A[ [1,100] [101,140] [201,300] ] & B[ [1,100] [101,200] [201,240] ]. A has had an error at txn 140 so that stream is incomplete. B has had an error at txn 240, so that stream is incomplete. Now if you used getNumberOfTransactions(101) for B, you get 140, and A you get 40. So, the stream from B is selected. But we can't read all 140, we must only read the next segment as we can't start reading half way through a segment(actually you can since HDFS-2187 , but that was done after this and its still undesirable). Since we are selecting all the streams before starting to read them, we can't wait until we've read to the end of the stream to get last txid. So getLastTxId() is useful here. getFirstTxId() is very useful in BackupImage to find the current inprogress stream.
          Hide
          Jitendra Nath Pandey added a comment -

          A few comments:
          getFirstTxnId and getLastTxnId seem a bit redundant in the EditLogInputStream interface. The first txnid must be the last read plus one. The last txnid can be obtained using getNumberOfTransactions.
          Similarly in the constructor of EditLogFileInputStream. It might lead to inconsistent use of EditLogFileInputStream, for example if the file contents don't match the transaction ids being passed.

          > what's the story on exclusive locking of journal managers after this? Do we still assume that they'll be locked by FSImage? Or should we push
          > down locking to the journal managers as well?
          I think JournalManagers should not be assumed to be locked by FSImage. FSEditLog should manage the synchronization for JournalManagers where they interact with namesystem.

          > not sure I follow why getNumberOfTransactions throws a CorruptionException if you've asked for a transaction ID in the middle of a gap. Isn't
          > it more consistent to just return 0? ie that journal just wasn't being written to for that txid?
          Is it intended to distinguish between the cases when there is gap or if the journal doesn't have any more transactions? It may be useful to distinguish between the two, but I agree that CorruptionException is not very convincing, because the journal is not really corrupt, it can still serve valid ranges of transactions.

          Show
          Jitendra Nath Pandey added a comment - A few comments: getFirstTxnId and getLastTxnId seem a bit redundant in the EditLogInputStream interface. The first txnid must be the last read plus one. The last txnid can be obtained using getNumberOfTransactions. Similarly in the constructor of EditLogFileInputStream. It might lead to inconsistent use of EditLogFileInputStream, for example if the file contents don't match the transaction ids being passed. > what's the story on exclusive locking of journal managers after this? Do we still assume that they'll be locked by FSImage? Or should we push > down locking to the journal managers as well? I think JournalManagers should not be assumed to be locked by FSImage. FSEditLog should manage the synchronization for JournalManagers where they interact with namesystem. > not sure I follow why getNumberOfTransactions throws a CorruptionException if you've asked for a transaction ID in the middle of a gap. Isn't > it more consistent to just return 0? ie that journal just wasn't being written to for that txid? Is it intended to distinguish between the cases when there is gap or if the journal doesn't have any more transactions? It may be useful to distinguish between the two, but I agree that CorruptionException is not very convincing, because the journal is not really corrupt, it can still serve valid ranges of transactions.
          Hide
          Ivan Kelly added a comment -

          Rebased on top of latest HDFS-2227 patch.

          Show
          Ivan Kelly added a comment - Rebased on top of latest HDFS-2227 patch.
          Hide
          Ivan Kelly added a comment -

          New patch. Applies on top of my HDFS-2227 patch. This is the third part of split proposed in Todd's comment avoid, which replaces the LogPlan with FSEditLog#selectInputStreams

          For review it may be easier to look at my github tree, as it shows this in three changesets.
          1. Adding the functionallity to FSEditLog & FileJournalManager to select input stream.
          2. Removal of LoadPlan.
          3. Fix up of recovery.

          https://github.com/ivankelly/hadoop-common/tree/HDFS-2018-loading

          Show
          Ivan Kelly added a comment - New patch. Applies on top of my HDFS-2227 patch. This is the third part of split proposed in Todd's comment avoid, which replaces the LogPlan with FSEditLog#selectInputStreams For review it may be easier to look at my github tree, as it shows this in three changesets. 1. Adding the functionallity to FSEditLog & FileJournalManager to select input stream. 2. Removal of LoadPlan. 3. Fix up of recovery. https://github.com/ivankelly/hadoop-common/tree/HDFS-2018-loading
          Hide
          Todd Lipcon added a comment -

          I made it through reviewing part of the most recent patch. Here are some review comments (though I still have more to look at):

          Specific issues:

          • needToSave() method has been reverted back to one with a TODO that always returns false
          • new getEditLogManifest implementation:
            • why re-create FileJournalManagers for each directory instead of looping over the existing journals array?
            • in testEditLogManifest, shouldn't you also test requesting a manifest mid-log-segment, since the code seems to try to deal with this?
            • the conditions for determining if a log is "better" than the previous one don't seem quite right here:
                        } else if (candidate.getStartTxId() < bestlog.getStartTxId()) {
                          bestlog = candidate;
              

              ... because any transactions that come before fromTxId are dead weight which will have to be skipped over.

            • please add some comments to these if clauses explaining the rationale - it's slightly tricky code.
            • the catch clause in this function seems to have the wrong error message, since there shouldn't be any counting going on. It also should include the exception as the second argument to the log message, so a trace is logged
            • seems strange that getLogFiles() throws IOE if the parameter is in the middle of segment, when the rest of the code now appears to try to deal with this case. Which is it?
          • getRemoteEditLog needs JavaDoc
          • FSEditLog.initJournals is now only called internally from the constructor - hence it can be made private
          • The name change of purgeLogsOlderThan to purgeTransactions seems to be a loss, since it's no longer clear that the transactions to be purged have txid strictly less than the argument. Can we rename to purgeTransactionsOlderThan instead?
            • the string status argument to mapJournalsAndReportErrors there should be updated to match the new function name as well
          • the addition of recoverUnclosedStreams in startLogSegment seems like the wrong spot. Instead, I think it makes more sense to make this a public API of JournalManager and call it (a) at startup, and (b) when we restore a journal manager which was previously marked failed (i.e in JournalAndStream.startLogSegment only when stream == null before the call. Later, we will probably want to make this asynchronous in some other thread, so the potentially lengthy recovery doesn't block txns.
          • what's the purpose of the inProgressCache? It seems a bit bug prone (static state scares me), and the file will usually either get renamed to *.corrupt or finalized if you call recoverUnclosedStreams first. I imagine the other functions like getInputStream and getMaxLoadableTransactions will always be called after the recovery step, and thus shouldn't depend on the cache, right?
            • illustration of my above point: there's currently a bug such that countTransactionsInInprogress will always return null the first time you call it on a file (putIfAbsent returns null if it was previously absent). So, I also wonder whether there's sufficient testing of this code path?
          • in recoverUnclosedStreams, refactoring out the rename-to-.corrupt to a moveAsideCorruptFile method as it was before would be nice. This could go in EditLogFile or a static method in FileJournalManager, either way.
          • when you moved FoundEditLog to EditLogFile, you retained file and lastTxId members not being final, since the old finalizeLog method was mutating them. But, now I think EditLogFile is completely immutable, right? So they could be made final again.
          • not sure I follow why getNumberOfTransactions throws a CorruptionException if you've asked for a transaction ID in the middle of a gap. Isn't it more consistent to just return 0? ie that journal just wasn't being written to for that txid?
          • for EditLogFile.getNumTransactions and EditLogValidation.getNumTransactions, it seems like you should only return 0 if both endTxId and startTxId are INVALID_TXID – even then, that seems like a kind of indeterminate state that would be produced by a bug. When would we ever not know the startTxId? If the members were startTxId and numTransactions then we wouldn't have the lack of clarity with what to do with the empty-log case.
          • the new member FSImage.editsDirs is assigned but never read.

          style nits:

          • needless wrapping in recoverUnclosedStreams for assignment of elf var
          • style:
                  if (fromTxId > elf.startTxId
                      && fromTxId <= elf.endTxId) {
            

            should be formatted like:

                  if ((fromTxId > elf.startTxId) &&
                      (fromTxId <= elf.endTxId)) {
            

            ...to conform with other code

          • JavaDoc style: first character after @param or @return should be lowercase, eg:
               * @return The number of transactions available from fromTxnId
            

            ...should have a lower case 'the'

          Potential split out patches: maybe this could be separated into separate patches for some of the following:

          • refactor getEditLogManifest to call into the FileJournalManagers
          • refactor FoundEditLog to new EditLogFile class in FileJournalManager
          • remove use of LoadPlan and instead load from JournalManagers

          Overall questions:

          • what's the story on exclusive locking of journal managers after this? Do we still assume that they'll be locked by FSImage? Or should we push down locking to the journal managers as well?
          Show
          Todd Lipcon added a comment - I made it through reviewing part of the most recent patch. Here are some review comments (though I still have more to look at): Specific issues: needToSave() method has been reverted back to one with a TODO that always returns false new getEditLogManifest implementation: why re-create FileJournalManagers for each directory instead of looping over the existing journals array? in testEditLogManifest, shouldn't you also test requesting a manifest mid-log-segment, since the code seems to try to deal with this? the conditions for determining if a log is "better" than the previous one don't seem quite right here: } else if (candidate.getStartTxId() < bestlog.getStartTxId()) { bestlog = candidate; ... because any transactions that come before fromTxId are dead weight which will have to be skipped over. please add some comments to these if clauses explaining the rationale - it's slightly tricky code. the catch clause in this function seems to have the wrong error message, since there shouldn't be any counting going on. It also should include the exception as the second argument to the log message, so a trace is logged seems strange that getLogFiles() throws IOE if the parameter is in the middle of segment, when the rest of the code now appears to try to deal with this case. Which is it? getRemoteEditLog needs JavaDoc FSEditLog.initJournals is now only called internally from the constructor - hence it can be made private The name change of purgeLogsOlderThan to purgeTransactions seems to be a loss, since it's no longer clear that the transactions to be purged have txid strictly less than the argument. Can we rename to purgeTransactionsOlderThan instead? the string status argument to mapJournalsAndReportErrors there should be updated to match the new function name as well the addition of recoverUnclosedStreams in startLogSegment seems like the wrong spot. Instead, I think it makes more sense to make this a public API of JournalManager and call it (a) at startup, and (b) when we restore a journal manager which was previously marked failed (i.e in JournalAndStream.startLogSegment only when stream == null before the call. Later, we will probably want to make this asynchronous in some other thread, so the potentially lengthy recovery doesn't block txns. what's the purpose of the inProgressCache ? It seems a bit bug prone (static state scares me), and the file will usually either get renamed to *.corrupt or finalized if you call recoverUnclosedStreams first. I imagine the other functions like getInputStream and getMaxLoadableTransactions will always be called after the recovery step, and thus shouldn't depend on the cache, right? illustration of my above point: there's currently a bug such that countTransactionsInInprogress will always return null the first time you call it on a file ( putIfAbsent returns null if it was previously absent). So, I also wonder whether there's sufficient testing of this code path? in recoverUnclosedStreams , refactoring out the rename-to-.corrupt to a moveAsideCorruptFile method as it was before would be nice. This could go in EditLogFile or a static method in FileJournalManager , either way. when you moved FoundEditLog to EditLogFile, you retained file and lastTxId members not being final, since the old finalizeLog method was mutating them. But, now I think EditLogFile is completely immutable, right? So they could be made final again. not sure I follow why getNumberOfTransactions throws a CorruptionException if you've asked for a transaction ID in the middle of a gap. Isn't it more consistent to just return 0? ie that journal just wasn't being written to for that txid? for EditLogFile.getNumTransactions and EditLogValidation.getNumTransactions , it seems like you should only return 0 if both endTxId and startTxId are INVALID_TXID – even then, that seems like a kind of indeterminate state that would be produced by a bug. When would we ever not know the startTxId ? If the members were startTxId and numTransactions then we wouldn't have the lack of clarity with what to do with the empty-log case. the new member FSImage.editsDirs is assigned but never read. style nits: needless wrapping in recoverUnclosedStreams for assignment of elf var style: if (fromTxId > elf.startTxId && fromTxId <= elf.endTxId) { should be formatted like: if ((fromTxId > elf.startTxId) && (fromTxId <= elf.endTxId)) { ...to conform with other code JavaDoc style: first character after @param or @return should be lowercase, eg: * @ return The number of transactions available from fromTxnId ...should have a lower case 'the' Potential split out patches : maybe this could be separated into separate patches for some of the following: refactor getEditLogManifest to call into the FileJournalManagers refactor FoundEditLog to new EditLogFile class in FileJournalManager remove use of LoadPlan and instead load from JournalManagers Overall questions : what's the story on exclusive locking of journal managers after this? Do we still assume that they'll be locked by FSImage? Or should we push down locking to the journal managers as well?
          Hide
          Todd Lipcon added a comment -

          Hi Jitendra. Yes, I'm continuing to look at this, and I wish it could be split into some more smaller pieces. It's tough to review at this point, and I still think the TransferFsImage stuff is a little incomplete with this patch. Let me formulate my thoughts and write them here... meanwhile, can you see any straightforward ways to split it in two?

          Show
          Todd Lipcon added a comment - Hi Jitendra. Yes, I'm continuing to look at this, and I wish it could be split into some more smaller pieces. It's tough to review at this point, and I still think the TransferFsImage stuff is a little incomplete with this patch. Let me formulate my thoughts and write them here... meanwhile, can you see any straightforward ways to split it in two?
          Hide
          Jitendra Nath Pandey added a comment -

          This jira is blocking couple of other jiras. A quick resolution to this one will also expedite HDFS-1580.
          Todd, do you have any more concerns with the patch?

          Show
          Jitendra Nath Pandey added a comment - This jira is blocking couple of other jiras. A quick resolution to this one will also expedite HDFS-1580 . Todd, do you have any more concerns with the patch?
          Hide
          Ivan Kelly added a comment -

          Updated to apply on trunk as HDFS-1073 has gone in.

          Show
          Ivan Kelly added a comment - Updated to apply on trunk as HDFS-1073 has gone in.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12487025/HDFS-2018.diff
          against trunk revision 1148348.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 34 new or modified tests.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/968//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12487025/HDFS-2018.diff against trunk revision 1148348. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 34 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/968//console This message is automatically generated.
          Hide
          Ivan Kelly added a comment -

          Rebased onto latest HDFS-1073 branch. All tests pass.

          Show
          Ivan Kelly added a comment - Rebased onto latest HDFS-1073 branch. All tests pass.
          Hide
          Ivan Kelly added a comment -

          in selectInputStream, it's counting both finalized and unfinalized transactions. But at startup, it should be recovering all of the inprogress logs to finalized logs, right? Given that, I don't think we need the API getNumberOfTransactions – ie we only need the finalized one.

          We need both, there are two times which you need to count the number of transactions on a journal, startup and checkpointing. For startup you want to consider inprogress logs. They're the result of a crash. For checkpointing, they shouldn't be. The primary is still writing to an inprogress.
          With a file based journal, you cannot tell if you are starting up or checkpointing without some kind of write lease for the journal, which we don't have now (May be a nice thing to have in future).

          the API change on the StorageArchiver interface seems less than ideal – an archiver may very well want to know the txid range of a log to know what to do with it – any way we can preserve this?

          I've put the txid range back into this API. I haven't used the FoundFSImage and FoundEditLog interfaces though, as it would create a circular dependency between StorageInspector and StorageArchiver. Also, FoundEditLog has gone away, so using File and longs makes it more uniform.

          the idea of the "remote edit log manifest" and the way we do edits transfer is inextricably linked to the idea of log segments. But, the new JournalManager APIs are based on the idea that logs are just sequences with no segmenting. I think having both ideas coexist is fairly confusing and a good opening for bugs – eg right now, the JournalManagers can return RemoteEditLogs for any transaction range, but the GetImageServlet still expects files. If edits are to be decoupled from files, then RemoteEditLogs should probably include a URI which identifies an edits transfer method. For FileJournalManager, the URI would be http-based and simply point to the GetImageServlet, but with BK-based logs it would point to the ZK ledger, right?

          Further to what I said about URIs last week, I spoke to Jitendra about this transfer before and he said that the plan was to take this functionality out of band, with rsync or something. Now that image and logs are decoupled this is possible.

          Show
          Ivan Kelly added a comment - in selectInputStream, it's counting both finalized and unfinalized transactions. But at startup, it should be recovering all of the inprogress logs to finalized logs, right? Given that, I don't think we need the API getNumberOfTransactions – ie we only need the finalized one. We need both, there are two times which you need to count the number of transactions on a journal, startup and checkpointing. For startup you want to consider inprogress logs. They're the result of a crash. For checkpointing, they shouldn't be. The primary is still writing to an inprogress. With a file based journal, you cannot tell if you are starting up or checkpointing without some kind of write lease for the journal, which we don't have now (May be a nice thing to have in future). the API change on the StorageArchiver interface seems less than ideal – an archiver may very well want to know the txid range of a log to know what to do with it – any way we can preserve this? I've put the txid range back into this API. I haven't used the FoundFSImage and FoundEditLog interfaces though, as it would create a circular dependency between StorageInspector and StorageArchiver. Also, FoundEditLog has gone away, so using File and longs makes it more uniform. the idea of the "remote edit log manifest" and the way we do edits transfer is inextricably linked to the idea of log segments. But, the new JournalManager APIs are based on the idea that logs are just sequences with no segmenting. I think having both ideas coexist is fairly confusing and a good opening for bugs – eg right now, the JournalManagers can return RemoteEditLogs for any transaction range, but the GetImageServlet still expects files. If edits are to be decoupled from files, then RemoteEditLogs should probably include a URI which identifies an edits transfer method. For FileJournalManager, the URI would be http-based and simply point to the GetImageServlet, but with BK-based logs it would point to the ZK ledger, right? Further to what I said about URIs last week, I spoke to Jitendra about this transfer before and he said that the plan was to take this functionality out of band, with rsync or something. Now that image and logs are decoupled this is possible.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12486006/HDFS-2018.diff
          against trunk revision 1144480.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 28 new or modified tests.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/904//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12486006/HDFS-2018.diff against trunk revision 1144480. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 28 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/904//console This message is automatically generated.
          Hide
          Ivan Kelly added a comment -

          New patch addresses all of Todd's comments accept two about getEditLogManifest. Will address that in another comment.

          Show
          Ivan Kelly added a comment - New patch addresses all of Todd's comments accept two about getEditLogManifest. Will address that in another comment.
          Hide
          Ivan Kelly added a comment -

          Good comments, I have a 3 hour flight later so I'll try to address them on it. A few points. Firstly, I'd really like to get this into 0.23. Once this patch is in, bookkeeper support can be added as a third party module with very little effort for 0.23, and it can be shipped as a builtin in later releases.

          The remoteEditLogManifest will only be used by file based journals. For bookkeeper the checkpointer will simply be configured to use bookkeeper as the editlog. GetImageServlet will be taken out of the equation for edits.

          Regarding the startup behaviour, I would need to add getFirstTxId() and getLastTxId() members to EditLogInputStream to be able to create a plan beforehand. Ill have a go at this later, and try to submit another patch once I get back near an internet connection.

          Show
          Ivan Kelly added a comment - Good comments, I have a 3 hour flight later so I'll try to address them on it. A few points. Firstly, I'd really like to get this into 0.23. Once this patch is in, bookkeeper support can be added as a third party module with very little effort for 0.23, and it can be shipped as a builtin in later releases. The remoteEditLogManifest will only be used by file based journals. For bookkeeper the checkpointer will simply be configured to use bookkeeper as the editlog. GetImageServlet will be taken out of the equation for edits. Regarding the startup behaviour, I would need to add getFirstTxId() and getLastTxId() members to EditLogInputStream to be able to create a plan beforehand. Ill have a go at this later, and try to submit another patch once I get back near an internet connection.
          Hide
          Todd Lipcon added a comment -

          Some initial notes on the patch:

          • the idea of the "remote edit log manifest" and the way we do edits transfer is inextricably linked to the idea of log segments. But, the new JournalManager APIs are based on the idea that logs are just sequences with no segmenting. I think having both ideas coexist is fairly confusing and a good opening for bugs – eg right now, the JournalManagers can return RemoteEditLogs for any transaction range, but the GetImageServlet still expects files. If edits are to be decoupled from files, then RemoteEditLogs should probably include a URI which identifies an edits transfer method. For FileJournalManager, the URI would be http-based and simply point to the GetImageServlet, but with BK-based logs it would point to the ZK ledger, right?
          • in getEditLogManifest, why don't we use the already-existing JournalManagers instead of recreating them based on storage directories?
          • that same function swallows the IOE with no log
          • same with selectInputStream – creating new JournalManager instead of using existing list. These functions are nearly static except for the "storage" member – maybe they should just be static methods to make them easier to test and clarify their tighter scope?
          • why did you add the no-arg constructor to EditLogLoader? It's used in a few places, but it seems those places then call validateEditLog non-statically even though it's a static function. not sure why.
          • some spurious changes elsewhere in FSEditLogLoader (unchecked cast, change to some case clauses)
          • typo in javadoc: "If the file is stream during the header"
          • rather than using "-1" as special value in this code, how about FSConstants.INVALID_TXID (which has some other negative value so it's more obvious when it gets used)
          • FSEditLogValidation now has non-orthogonal information – the number of valid edits should be exactly endTxId - startTxId + 1, right? We should remove either endTxId or numTransactions
          • perhaps the validate code should also have a check that txids are always increasing by 1?
          +    this.editsDirs = new ArrayList<URI>();
          +    this.editsDirs.addAll(editsDirs);
          

          why not just Lists.newArrayList(editsDirs)?

          • the startup behavior with this patch changes such that the image is always loaded before edits are even considered. In the case that the edits are somehow inaccessible or incomplete, it will still load the image and only then realize that the logs can't be loaded. Can we restore the previous behavior that a full "plan" is made before proceeding with the heavy lifting of image loading?
          • in loadEdits – should log at INFO level each stream that is being loaded. This is important for ops to know what the NN is up to doing startup.
          • can we remove some of the dup code between old-format and new format? eg add a function which returns Iterable<EditLogInputStream>? the "old" impl would create them based on inspector.getLatestEditFiles, and the "new" impl would encapsulate the selectInputStream logic.
          • it bugs me that "getLatestEditsFiles()" has a side effect. IMO functions called "get*" should be side effect free.
          • in selectInputStream, it's counting both finalized and unfinalized transactions. But at startup, it should be recovering all of the inprogress logs to finalized logs, right? Given that, I don't think we need the API getNumberOfTransactions – ie we only need the finalized one.
          • style nites: please camel case variable names, eg alllogfiles and logfiles should be allLogFiles, logFiles. Also inprogress} -> {{inProgress.
          • in getLogFiles there's a needless continue at the end of the loop
          • maybe make the comparator used in that function a constant named something like EditLogFile.COMPARE_BY_START_TXID?
          • Add javadoc for CorruptionException.
          • the API change on the StorageArchiver interface seems less than ideal – an archiver may very well want to know the txid range of a log to know what to do with it – any way we can preserve this?
          • static long readAll – unused param nextTxId, unused count var. Seems like this method isn't very useful either.
          • this AbortSpec class in the tests needs some doc - finding it hard to understand the test case. Perhaps a couple log messages interspersed in setupEdits would also be handy both for debugging and for doc purposes.
          • once I figured out what the test was doing, I liked it, though nice tests.
          • a brief javadoc for each of the individual test cases would be handy too - eg explaining what the file layout it's building looks like – perhaps paste output of "find dfs/name/current -name edits*" in the javadoc?
          • in testLoadingWithGaps, why call selectInputStream twice?
          • extract a constant for the build/test/data dir – you're using it lots of places both in TestEditLog and TestFileJournalManager
          • in testEditLogManifest could be more concise to construct editUris: ImmutableList.of(f1.toURI(), f2.toURI(), f3.toURI()).
          • still a TODO about log archiving - I didnt look at the mods in this test file yet.
          • do the functional tests for log archiving pass?
          Show
          Todd Lipcon added a comment - Some initial notes on the patch: the idea of the "remote edit log manifest" and the way we do edits transfer is inextricably linked to the idea of log segments. But, the new JournalManager APIs are based on the idea that logs are just sequences with no segmenting. I think having both ideas coexist is fairly confusing and a good opening for bugs – eg right now, the JournalManagers can return RemoteEditLogs for any transaction range, but the GetImageServlet still expects files. If edits are to be decoupled from files, then RemoteEditLogs should probably include a URI which identifies an edits transfer method. For FileJournalManager, the URI would be http-based and simply point to the GetImageServlet, but with BK-based logs it would point to the ZK ledger, right? in getEditLogManifest, why don't we use the already-existing JournalManagers instead of recreating them based on storage directories? that same function swallows the IOE with no log same with selectInputStream – creating new JournalManager instead of using existing list. These functions are nearly static except for the "storage" member – maybe they should just be static methods to make them easier to test and clarify their tighter scope? why did you add the no-arg constructor to EditLogLoader? It's used in a few places, but it seems those places then call validateEditLog non-statically even though it's a static function. not sure why. some spurious changes elsewhere in FSEditLogLoader (unchecked cast, change to some case clauses) typo in javadoc: "If the file is stream during the header" rather than using "-1" as special value in this code, how about FSConstants.INVALID_TXID (which has some other negative value so it's more obvious when it gets used) FSEditLogValidation now has non-orthogonal information – the number of valid edits should be exactly endTxId - startTxId + 1, right? We should remove either endTxId or numTransactions perhaps the validate code should also have a check that txids are always increasing by 1? + this .editsDirs = new ArrayList<URI>(); + this .editsDirs.addAll(editsDirs); why not just Lists.newArrayList(editsDirs) ? the startup behavior with this patch changes such that the image is always loaded before edits are even considered. In the case that the edits are somehow inaccessible or incomplete, it will still load the image and only then realize that the logs can't be loaded. Can we restore the previous behavior that a full "plan" is made before proceeding with the heavy lifting of image loading? in loadEdits – should log at INFO level each stream that is being loaded. This is important for ops to know what the NN is up to doing startup. can we remove some of the dup code between old-format and new format? eg add a function which returns Iterable<EditLogInputStream>? the "old" impl would create them based on inspector.getLatestEditFiles, and the "new" impl would encapsulate the selectInputStream logic. it bugs me that "getLatestEditsFiles()" has a side effect. IMO functions called "get*" should be side effect free. in selectInputStream, it's counting both finalized and unfinalized transactions. But at startup, it should be recovering all of the inprogress logs to finalized logs, right? Given that, I don't think we need the API getNumberOfTransactions – ie we only need the finalized one. style nites: please camel case variable names, eg alllogfiles and logfiles should be allLogFiles , logFiles . Also inprogress} -> {{inProgress . in getLogFiles there's a needless continue at the end of the loop maybe make the comparator used in that function a constant named something like EditLogFile.COMPARE_BY_START_TXID? Add javadoc for CorruptionException. the API change on the StorageArchiver interface seems less than ideal – an archiver may very well want to know the txid range of a log to know what to do with it – any way we can preserve this? static long readAll – unused param nextTxId , unused count var. Seems like this method isn't very useful either. this AbortSpec class in the tests needs some doc - finding it hard to understand the test case. Perhaps a couple log messages interspersed in setupEdits would also be handy both for debugging and for doc purposes. once I figured out what the test was doing, I liked it, though nice tests. a brief javadoc for each of the individual test cases would be handy too - eg explaining what the file layout it's building looks like – perhaps paste output of "find dfs/name/current -name edits*" in the javadoc? in testLoadingWithGaps , why call selectInputStream twice? extract a constant for the build/test/data dir – you're using it lots of places both in TestEditLog and TestFileJournalManager in testEditLogManifest could be more concise to construct editUris : ImmutableList.of(f1.toURI(), f2.toURI(), f3.toURI()) . still a TODO about log archiving - I didnt look at the mods in this test file yet. do the functional tests for log archiving pass?
          Hide
          Todd Lipcon added a comment -

          If you are 2 weeks away from merging 1073 into trunk

          As soon as the currently patch-available issues, I should be ready to merge. So I think closer to 1 week. If we commit this patch as is, those other patch-available issues will need to be redone. Or, if we commit those first, this patch will need to be redone. So, I think this patch could delay the merge by another 2-3 weeks (I'm on vacation all of next week which also impacts timing, unfortunately)

          I was thinking the "dual merge" strategy would satisfy both objectives. The first merge (without this patch) would get more people running/testing the new 1073 code and enable us to start a branch for NN HA. Then we finish this work and do a second merge when it's ready.

          If you think it's an absolute requirement, I will try to rebase this patch on top of the other patch-available issues, but it probably won't be ready til 7/18 or so.

          Show
          Todd Lipcon added a comment - If you are 2 weeks away from merging 1073 into trunk As soon as the currently patch-available issues, I should be ready to merge. So I think closer to 1 week. If we commit this patch as is, those other patch-available issues will need to be redone. Or, if we commit those first, this patch will need to be redone. So, I think this patch could delay the merge by another 2-3 weeks (I'm on vacation all of next week which also impacts timing, unfortunately) I was thinking the "dual merge" strategy would satisfy both objectives. The first merge (without this patch) would get more people running/testing the new 1073 code and enable us to start a branch for NN HA. Then we finish this work and do a second merge when it's ready. If you think it's an absolute requirement, I will try to rebase this patch on top of the other patch-available issues, but it probably won't be ready til 7/18 or so.
          Hide
          Sanjay Radia added a comment -

          HDFS 1580 (Journal class) is important to get into 0.23 since it will help us abstract away fencing etc; HDFS-1580 is easier after this patch.

          If you are 2 weeks away from merging 1073 into trunk then we should commit this patch into 1073 branch since it does clean up the code and makes 1580 easier and makes the code simpler.

          Show
          Sanjay Radia added a comment - HDFS 1580 (Journal class) is important to get into 0.23 since it will help us abstract away fencing etc; HDFS-1580 is easier after this patch. If you are 2 weeks away from merging 1073 into trunk then we should commit this patch into 1073 branch since it does clean up the code and makes 1580 easier and makes the code simpler.
          Hide
          Todd Lipcon added a comment -

          Hey Ivan/Jitendra. I'm taking a look at this patch, and while I really like the direction it's headed, I'd like to propose that we hold off on committing it until after HDFS-1073 is merged into trunk. My thinking is that we'd finish the remaining tasks on 1073 (per my email to general@ last night), merge it, and then keep the 1073 branch alive to finish the work related to HDFS-1580 and this JIRA.

          My reasoning is that the 1073 merge is a blocker both for the NN HA work and for the 0.23 release. While it would be awesome to get the hooks in for BK in the same timeframe, I don't think it should be a release blocker, since it's truly a new feature.

          Does that make sense? I'll try to review this in parallel while we get the remaining 1073 tasks done.

          Show
          Todd Lipcon added a comment - Hey Ivan/Jitendra. I'm taking a look at this patch, and while I really like the direction it's headed, I'd like to propose that we hold off on committing it until after HDFS-1073 is merged into trunk. My thinking is that we'd finish the remaining tasks on 1073 (per my email to general@ last night), merge it, and then keep the 1073 branch alive to finish the work related to HDFS-1580 and this JIRA. My reasoning is that the 1073 merge is a blocker both for the NN HA work and for the 0.23 release. While it would be awesome to get the hooks in for BK in the same timeframe, I don't think it should be a release blocker, since it's truly a new feature. Does that make sense? I'll try to review this in parallel while we get the remaining 1073 tasks done.
          Hide
          Ivan Kelly added a comment -

          Addressed Jitendra's comments. There's one fail on TestCheckpoint which is on plain HDFS-1073 branch also. This should be trivial to fix in another JIRA.

          Show
          Ivan Kelly added a comment - Addressed Jitendra's comments. There's one fail on TestCheckpoint which is on plain HDFS-1073 branch also. This should be trivial to fix in another JIRA.
          Hide
          Jitendra Nath Pandey added a comment -

          Some comments:
          1. FileJournalManager.java
          getMaxLoadableTransaction can be made private.
          2. JournalManager interface
          Instead of adding archiveLogsOlderThan to the interface, we could add purgeTransactions method (as in HDFS-1580 design). FileJournalManager could implement purgeTransactions using archives, i.e. instead of really deleting, it archives them. Since Checkpoints are being archived, we don't need to force any JournalManager to archive edit logs as well.

          Apart from the above, the patch looks good to me. +1

          Show
          Jitendra Nath Pandey added a comment - Some comments: 1. FileJournalManager.java getMaxLoadableTransaction can be made private. 2. JournalManager interface Instead of adding archiveLogsOlderThan to the interface, we could add purgeTransactions method (as in HDFS-1580 design). FileJournalManager could implement purgeTransactions using archives, i.e. instead of really deleting, it archives them. Since Checkpoints are being archived, we don't need to force any JournalManager to archive edit logs as well. Apart from the above, the patch looks good to me. +1
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12485300/HDFS-2018.diff
          against trunk revision 1143106.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 51 new or modified tests.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/878//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12485300/HDFS-2018.diff against trunk revision 1143106. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 51 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/878//console This message is automatically generated.
          Hide
          Ivan Kelly added a comment -

          I've removed a lot of tests from TestFSImageStorageInspector. The functionality is now tested by the following:

          public void testCurrentStorageInspector() throws IOException {

          • Simply removed editlog stuff

          public void testPlanWithGaps() throws IOException {

          • TestEditLog#testLoadingWithGaps()

          public void testPlanWithInProgressInMiddle() throws IOException {

          • Scenario is not possible. Inprogress files will be finalized before creating a new segment, however this will lead to the case of InconsistentEndTxIds (see below).

          public void testLogGroupRecoveryNoop() throws IOException {

          • TestFileJournalManager#testNormalOperation

          public void testLogGroupRecoveryMixed() throws IOException {

          • TestFileJournalManager#testInprogressRecovery
          • TestFileJournalManager#testInprogressRecoveryMixed

          public void testLogGroupRecoveryInconsistentEndTxIds() throws IOException {

          • TestFileJournalManager#testManyLogsWithGapWhereInprogressInMiddle
            This isn't a failure case. If log A has 1-2 and log B has 1-5, then that means A failed after transaction 2 and wasn't reenabled until the next roll if at all. This means that A has a gap, from 3-5, so on loading B will be selected for that segment.

          public void testLogGroupRecoveryInProgress() throws IOException {

          • TestFileJournalManager#testInprogressRecoveryAll

          public void testCurrentSplitEditsAndImage() throws IOException {

          • No longer relevant whith storage inspector only handling image files

          public void testLogManifest() throws IOException {

          • TestEditLog#testEditLogManifest

          public void testLogManifestInProgressComesFirst() throws IOException {

          • TestEditLog#testEditLogInprogressComesFirst
          Show
          Ivan Kelly added a comment - I've removed a lot of tests from TestFSImageStorageInspector. The functionality is now tested by the following: public void testCurrentStorageInspector() throws IOException { Simply removed editlog stuff public void testPlanWithGaps() throws IOException { TestEditLog#testLoadingWithGaps() public void testPlanWithInProgressInMiddle() throws IOException { Scenario is not possible. Inprogress files will be finalized before creating a new segment, however this will lead to the case of InconsistentEndTxIds (see below). public void testLogGroupRecoveryNoop() throws IOException { TestFileJournalManager#testNormalOperation public void testLogGroupRecoveryMixed() throws IOException { TestFileJournalManager#testInprogressRecovery TestFileJournalManager#testInprogressRecoveryMixed public void testLogGroupRecoveryInconsistentEndTxIds() throws IOException { TestFileJournalManager#testManyLogsWithGapWhereInprogressInMiddle This isn't a failure case. If log A has 1-2 and log B has 1-5, then that means A failed after transaction 2 and wasn't reenabled until the next roll if at all. This means that A has a gap, from 3-5, so on loading B will be selected for that segment. public void testLogGroupRecoveryInProgress() throws IOException { TestFileJournalManager#testInprogressRecoveryAll public void testCurrentSplitEditsAndImage() throws IOException { No longer relevant whith storage inspector only handling image files public void testLogManifest() throws IOException { TestEditLog#testEditLogManifest public void testLogManifestInProgressComesFirst() throws IOException { TestEditLog#testEditLogInprogressComesFirst
          Hide
          Ivan Kelly added a comment -

          This patch applies over the HDFS-1073 branch as it is on 5th July. The code is complete, so it's ready for review.

          Show
          Ivan Kelly added a comment - This patch applies over the HDFS-1073 branch as it is on 5th July. The code is complete, so it's ready for review.
          Hide
          Jitendra Nath Pandey added a comment -

          I'm no longer at Yahoo!. My new email address is jitendra@hortonworks.com. Please update your address book and resend your message.

          Show
          Jitendra Nath Pandey added a comment - I'm no longer at Yahoo!. My new email address is jitendra@hortonworks.com. Please update your address book and resend your message.
          Hide
          Ivan Kelly added a comment -

          new patch rebased on top of moved hdfs-common/hdfs.

          I haven't tackled archiving yet, and storage inspector tests are simply commented out now, but I will sort all that stuff out on monday.

          For archiving, I think the entry point for the FSEditLog should simply pass in the txnid. StorageArchiver is file specific. I propose we make the StorageArchiver a member of NNStorage, we can make it configurable if need be. Then the FileJournalManager should request it from its storage object and archive that way.

          Show
          Ivan Kelly added a comment - new patch rebased on top of moved hdfs-common/hdfs. I haven't tackled archiving yet, and storage inspector tests are simply commented out now, but I will sort all that stuff out on monday. For archiving, I think the entry point for the FSEditLog should simply pass in the txnid. StorageArchiver is file specific. I propose we make the StorageArchiver a member of NNStorage, we can make it configurable if need be. Then the FileJournalManager should request it from its storage object and archive that way.
          Hide
          Ivan Kelly added a comment -

          Added RemoteEditLogManifest stuff to allow it to take segments from different journals.

          Show
          Ivan Kelly added a comment - Added RemoteEditLogManifest stuff to allow it to take segments from different journals.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12483316/HDFS-2018.diff
          against trunk revision 1138098.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 26 new or modified tests.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/808//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12483316/HDFS-2018.diff against trunk revision 1138098. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 26 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/808//console This message is automatically generated.
          Hide
          Ivan Kelly added a comment -

          Had forgotten some changes before uploading. This one should pass most tests.

          Show
          Ivan Kelly added a comment - Had forgotten some changes before uploading. This one should pass most tests.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12483307/HDFS-2018.diff
          against trunk revision 1138098.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 26 new or modified tests.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/806//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12483307/HDFS-2018.diff against trunk revision 1138098. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 26 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/806//console This message is automatically generated.
          Hide
          Ivan Kelly added a comment -

          Removed LoadPlan, I've started on the tests. Inspector editlog tests now moved mostly to TestFileJournalManager.

          I haven't even thought about archiving yet.

          Show
          Ivan Kelly added a comment - Removed LoadPlan, I've started on the tests. Inspector editlog tests now moved mostly to TestFileJournalManager. I haven't even thought about archiving yet.
          Hide
          Ivan Kelly added a comment -

          I think all the patches are moving in the same direction. In fact the code seems functionally the same, just put in different places. Regarding getEditLogManifest, I can change that to use a similar pattern to the loading code.

          Show
          Ivan Kelly added a comment - I think all the patches are moving in the same direction. In fact the code seems functionally the same, just put in different places. Regarding getEditLogManifest, I can change that to use a similar pattern to the loading code.
          Hide
          Todd Lipcon added a comment -

          Some comments on this patch:

          • general idea seems right...
          • I think some of it overlaps with HDFS-2085 and HDFS-2074, which are awaiting review. Can you take a look at those?
          • for getEditLogManifest I think you need to support the case that different journal managers will have different sets of logs, but we need to be able to transfer all of them. ie imagine the case with two edits directories where one fails, comes back, then the other fails. In that case you need to interleave copying txns from both of them when transferring edits to the 2NN.
          • I just opened HDFS-2088 and about to put a patch up there in a few minutes. That deals with the archiving logic and makes some similar changes (eg refactoring some stuff out of FSImageTransactionalStorageInspector into FileJournalManager)

          Let me see if I can merge some of your work into my branch – sorry that I'm a few patches ahead of what's been committed.

          Show
          Todd Lipcon added a comment - Some comments on this patch: general idea seems right... I think some of it overlaps with HDFS-2085 and HDFS-2074 , which are awaiting review. Can you take a look at those? for getEditLogManifest I think you need to support the case that different journal managers will have different sets of logs, but we need to be able to transfer all of them. ie imagine the case with two edits directories where one fails, comes back, then the other fails. In that case you need to interleave copying txns from both of them when transferring edits to the 2NN. I just opened HDFS-2088 and about to put a patch up there in a few minutes. That deals with the archiving logic and makes some similar changes (eg refactoring some stuff out of FSImageTransactionalStorageInspector into FileJournalManager) Let me see if I can merge some of your work into my branch – sorry that I'm a few patches ahead of what's been committed.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12483194/HDFS-2018.diff
          against trunk revision 1137675.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 20 new or modified tests.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/800//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12483194/HDFS-2018.diff against trunk revision 1137675. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 20 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/800//console This message is automatically generated.
          Hide
          Ivan Kelly added a comment -

          Forgot to mention, the cleanup I plan to do is to remove LoadPlan as its not needed anymore. Also testing, as this needs a good few tests to verify the functionality.

          Show
          Ivan Kelly added a comment - Forgot to mention, the cleanup I plan to do is to remove LoadPlan as its not needed anymore. Also testing, as this needs a good few tests to verify the functionality.
          Hide
          Ivan Kelly added a comment -

          Another rough patch, which I'll clean up tomorrow. In this patch, multiple journal streams can be used in loading. PreTransaction stuff is confined solely to FSImage, JournalManagers never know about it. Also file and image loading are completely separate now.

          Show
          Ivan Kelly added a comment - Another rough patch, which I'll clean up tomorrow. In this patch, multiple journal streams can be used in loading. PreTransaction stuff is confined solely to FSImage, JournalManagers never know about it. Also file and image loading are completely separate now.
          Hide
          Todd Lipcon added a comment -

          Keep in mind that at startup we need to support loading transactions from more than one JournalManager, in order to handle the case where the NN was running while one went offline, came back, then the other went offline.

          Show
          Todd Lipcon added a comment - Keep in mind that at startup we need to support loading transactions from more than one JournalManager, in order to handle the case where the NN was running while one went offline, came back, then the other went offline.
          Hide
          Ivan Kelly added a comment -

          Yes, I've been mulling over this in the back of my head of the last week, but haven't had a chance to code it yet. I think what I need to do here is create a new method of FSEditLog like.

          public class FSEditLog {
            ...
            
            JournalManager getBestJournal() {
              if (/* pre transaction version */) {
                 FSImageOldStorageInspector inspector = new FSImageOldStorageInspector();
                 // inspect all storagedirs in this.storage
                 return new FilePreTransactionJournalManager(inspector.getEditLogFiles());
              } else {
                 // choose by looping through edit log URIs, creating FileJournalManager objects
                 // and returning the one with most transactions.
              }
            }  
          }
          

          The creation of the FileJournalManager object can be outsourced to another method. It eventually will anyhow, as the factory stuff will need ot be there.

          Show
          Ivan Kelly added a comment - Yes, I've been mulling over this in the back of my head of the last week, but haven't had a chance to code it yet. I think what I need to do here is create a new method of FSEditLog like. public class FSEditLog { ... JournalManager getBestJournal() { if (/* pre transaction version */) { FSImageOldStorageInspector inspector = new FSImageOldStorageInspector(); // inspect all storagedirs in this .storage return new FilePreTransactionJournalManager(inspector.getEditLogFiles()); } else { // choose by looping through edit log URIs, creating FileJournalManager objects // and returning the one with most transactions. } } } The creation of the FileJournalManager object can be outsourced to another method. It eventually will anyhow, as the factory stuff will need ot be there.
          Hide
          Todd Lipcon added a comment -

          Hi Ivan. I've been thinking about this patch the last couple days and I think I agree we need to do this. The difficulty seems to be in how we'll manage the current semantics when dealing with upgrading an old storage directory, while also keeping the code paths manageable.

          Let me look over the diff itself and see how it looks.

          Show
          Todd Lipcon added a comment - Hi Ivan. I've been thinking about this patch the last couple days and I think I agree we need to do this. The difficulty seems to be in how we'll manage the current semantics when dealing with upgrading an old storage directory, while also keeping the code paths manageable. Let me look over the diff itself and see how it looks.
          Hide
          Ivan Kelly added a comment -

          This would also mean removing edits completely from the LoadPlan. I think this is a good idea to decouple image and edits.

          Show
          Ivan Kelly added a comment - This would also mean removing edits completely from the LoadPlan. I think this is a good idea to decouple image and edits.
          Hide
          Jitendra Nath Pandey added a comment -

          I think FSImageTransactionalStorageInspector should not inspect the edit logs. The FSImage Inspectors should just choose the best image, and maximum transaction associated with it. Then the best journal should be chosen just based on the maximum transaction obtained from the image, outside this class. The loading of image and edits should be as independent as possible using transaction ids to establish the correspondence.

          Show
          Jitendra Nath Pandey added a comment - I think FSImageTransactionalStorageInspector should not inspect the edit logs. The FSImage Inspectors should just choose the best image, and maximum transaction associated with it. Then the best journal should be chosen just based on the maximum transaction obtained from the image, outside this class. The loading of image and edits should be as independent as possible using transaction ids to establish the correspondence.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12481371/HDFS-2018.diff
          against trunk revision 1130870.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 20 new or modified tests.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/698//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12481371/HDFS-2018.diff against trunk revision 1130870. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 20 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/698//console This message is automatically generated.
          Hide
          Ivan Kelly added a comment -

          Updated to work on top of latest HDFS-2003 patch

          Show
          Ivan Kelly added a comment - Updated to work on top of latest HDFS-2003 patch
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12480927/HDFS-2018.diff
          against trunk revision 1128987.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 20 new or modified tests.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/661//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12480927/HDFS-2018.diff against trunk revision 1128987. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 20 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/661//console This message is automatically generated.
          Hide
          Ivan Kelly added a comment -

          First patch. TestFSImageStorageInspector has been disabled for the moment until I see what I can do with it.

          Show
          Ivan Kelly added a comment - First patch. TestFSImageStorageInspector has been disabled for the moment until I see what I can do with it.

            People

            • Assignee:
              Ivan Kelly
              Reporter:
              Ivan Kelly
            • Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development