Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-1073

Simpler model for Namenode's fs Image and edit Logs

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 0.23.0
    • 0.23.0
    • None
    • None
    • Incompatible change, Reviewed
    • Hide
      The NameNode's storage layout for its name directories has been reorganized to be more robust. Each edit now has a unique transaction ID, and each file is associated with a transaction ID (for checkpoints) or a range of transaction IDs (for edit logs).
      Show
      The NameNode's storage layout for its name directories has been reorganized to be more robust. Each edit now has a unique transaction ID, and each file is associated with a transaction ID (for checkpoints) or a range of transaction IDs (for edit logs).

    Description

      The naming and handling of NN's fsImage and edit logs can be significantly improved resulting simpler and more robust code.

      Attachments

        1. ASF.LICENSE.NOT.GRANTED--hdfs1073.pdf
          88 kB
          Todd Lipcon
        2. hdfs-1073.txt
          207 kB
          Todd Lipcon
        3. hdfs-1073-editloading-algos.txt
          37 kB
          Todd Lipcon
        4. hdfs1073.pdf
          159 kB
          Todd Lipcon
        5. hdfs1073.pdf
          189 kB
          Todd Lipcon
        6. hdfs1073.tex
          27 kB
          Todd Lipcon
        7. hdfs-1073-merge.patch
          738 kB
          Todd Lipcon
        8. hdfs-1073-merge.patch
          740 kB
          Todd Lipcon
        9. hdfs-1073-merge.patch
          734 kB
          Todd Lipcon

        Issue Links

          1.
          Refactor edit log loading to a separate class from edit log writing Sub-task Closed Todd Lipcon
          2.
          Refactor storage management into separate classes than fsimage file reading/writing Sub-task Closed Todd Lipcon
          3.
          Remove intentionally corrupt 0.13 directory layout creation Sub-task Closed Todd Lipcon
          4.
          Persist transaction ID on disk between NN restarts Sub-task Resolved Todd Lipcon
          5.
          Refactor more startup and image loading code out of FSImage Sub-task Resolved Todd Lipcon
          6.
          Add code to detect valid length of an edits file Sub-task Resolved Todd Lipcon
          7.
          Add code to inspect a storage directory with txid-based filenames Sub-task Resolved Todd Lipcon
          8.
          Add code to list which edit logs are available on a remote NN Sub-task Resolved Todd Lipcon
          9.
          Refactor log rolling and filename management out of FSEditLog Sub-task Resolved Todd Lipcon
          10.
          reduce need to rewrite fsimage on statrtup Sub-task Resolved Todd Lipcon
          11.
          Extend image checksumming to function with multiple fsimage files Sub-task Resolved Todd Lipcon
          12.
          Remove use of timestamps to identify checkpoints and logs Sub-task Resolved Todd Lipcon
          13.
          Add migration tests from old-format to new-format storage Sub-task Resolved Unassigned
          14.
          Add state management variables to FSEditLog Sub-task Resolved Todd Lipcon
          15.
          Add some convenience functions to iterate over edit log streams Sub-task Resolved Todd Lipcon
          16.
          Update HDFS-1073 branch to deal with OP_INVALID-filled preallocation Sub-task Resolved Todd Lipcon
          17.
          Change edit logs and images to be named based on txid Sub-task Resolved Todd Lipcon
          18.
          Add constants for LAYOUT_VERSIONs in edits log branch Sub-task Resolved Todd Lipcon
          19.
          Additional QA tasks for Edit Log branch Sub-task Resolved Todd Lipcon
          20.
          Remove references to StorageDirectory from JournalManager interface Sub-task Resolved Ivan Kelly
          21.
          TestDFSUpgrade failing in HDFS-1073 branch Sub-task Resolved Todd Lipcon
          22.
          HDFS-1073: Fix backupnode for new edits/image layout Sub-task Resolved Todd Lipcon
          23.
          1073: Enable multiple checkpointers to run simultaneously Sub-task Resolved Todd Lipcon
          24.
          HDFS-1073: Cleanup in image transfer servlet Sub-task Resolved Todd Lipcon
          25.
          HDFS-1073: Test for 2NN downloading image is not running Sub-task Resolved Todd Lipcon
          26.
          HDFS-1073: Some refactoring of 2NN to easier share code with BN and CN Sub-task Resolved Todd Lipcon
          27.
          Remove vestiges of NNStorageListener Sub-task Resolved Todd Lipcon
          28.
          TestCheckpoint needs to clean up between cases Sub-task Resolved Todd Lipcon
          29.
          Fix race conditions when running two rapidly checkpointing 2NNs Sub-task Resolved Todd Lipcon
          30.
          Image transfer process misreports client side exceptions Sub-task Resolved Todd Lipcon
          31.
          HDFS-1073: Kill previous.checkpoint, lastcheckpoint.tmp directories Sub-task Resolved Todd Lipcon
          32.
          Clean up and test behavior under failed edit streams Sub-task Resolved Aaron Myers
          33.
          1073: Remove checkpointTxId from VERSION file Sub-task Resolved Todd Lipcon
          34.
          1073: remove/archive unneeded/old storage files Sub-task Resolved Todd Lipcon
          35.
          1073: 2NN needs to handle case of reformatted NN better Sub-task Resolved Todd Lipcon
          36.
          1073: Image inspector should return finalized logs before unfinalized logs Sub-task Resolved Todd Lipcon
          37.
          1073: Improve TestNamespace and TestEditLog in 1073 branch Sub-task Resolved Todd Lipcon
          38.
          1073: Improve upgrade tests from 0.22 Sub-task Resolved Todd Lipcon
          39.
          1073: determine edit log validity by truly reading and validating transactions Sub-task Resolved Todd Lipcon
          40.
          1073: address checkpoint upload when one of the storage dirs is failed Sub-task Resolved Todd Lipcon
          41.
          1073: NN should not clear storage directory when restoring removed storage Sub-task Resolved Todd Lipcon
          42.
          1073: create an escape hatch to ignore startup consistency problems Sub-task Resolved Colin McCabe
          43.
          1073: finalize inprogress edit logs at startup Sub-task Resolved Todd Lipcon
          44.
          1073: Move edits log archiving logic into FSEditLog/JournalManager Sub-task Resolved Todd Lipcon
          45.
          1073: Handle case where an entirely empty log is left during NN crash Sub-task Resolved Todd Lipcon
          46.
          1073: consider adding END_LOG_SEGMENT txn when finalizing inprogress logs at startup Sub-task Open Unassigned
          47.
          1073: update remaining unit tests to new storage filenames Sub-task Resolved Todd Lipcon
          48.
          1073: Add a flag to 2NN to format its checkpoint dirs on startup Sub-task Resolved Todd Lipcon
          49.
          1073: Checkpoint interval should be based on txn count, not size Sub-task Resolved Todd Lipcon
          50.
          1073: address remaining TODOs and pre-merge cleanup Sub-task Resolved Todd Lipcon
          51.
          1073: fix regression of HDFS-1955 in branch Sub-task Resolved Todd Lipcon
          52.
          1073: Fault injection for StorageDirectory failures during read/write of FSImage/Edits files Sub-task Open Unassigned
          53.
          1073: Zero pad edits filename to make them lexically sortable Sub-task Resolved Ivan Kelly
          54.
          1073: Move all journal stream management code into one place Sub-task Closed Ivan Kelly
          55.
          1073: fix CreateEditsLog test tool in branch Sub-task Resolved Todd Lipcon
          56.
          1073: Reenable TestEditLog.testFailedOpen and fix exposed bug Sub-task Resolved Todd Lipcon
          57.
          1073: clean up TestCheckpoint and remove TODOs Sub-task Resolved Todd Lipcon
          58.
          1073: Address remaining TODOs Sub-task Resolved Todd Lipcon
          59.
          1073: address findbugs/javadoc warnings Sub-task Resolved Todd Lipcon
          60.
          saveNamespace should not throw IOE when only one storage directory fails to write VERSION file Sub-task Resolved Andras Bokor
          61.
          Complete decoupling of failure states between edits and image dirs Sub-task Open Unassigned

          Activity

            People

              tlipcon Todd Lipcon
              sanjay.radia Sanjay Radia
              Votes:
              0 Vote for this issue
              Watchers:
              48 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: