Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-1073

Simpler model for Namenode's fs Image and edit Logs

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 0.23.0
    • 0.23.0
    • None
    • None
    • Incompatible change, Reviewed
    • Hide
      The NameNode's storage layout for its name directories has been reorganized to be more robust. Each edit now has a unique transaction ID, and each file is associated with a transaction ID (for checkpoints) or a range of transaction IDs (for edit logs).
      Show
      The NameNode's storage layout for its name directories has been reorganized to be more robust. Each edit now has a unique transaction ID, and each file is associated with a transaction ID (for checkpoints) or a range of transaction IDs (for edit logs).

    Description

      The naming and handling of NN's fsImage and edit logs can be significantly improved resulting simpler and more robust code.

      Attachments

        1. ASF.LICENSE.NOT.GRANTED--hdfs1073.pdf
          88 kB
          Todd Lipcon
        2. hdfs1073.pdf
          189 kB
          Todd Lipcon
        3. hdfs1073.pdf
          159 kB
          Todd Lipcon
        4. hdfs1073.tex
          27 kB
          Todd Lipcon
        5. hdfs-1073.txt
          207 kB
          Todd Lipcon
        6. hdfs-1073-editloading-algos.txt
          37 kB
          Todd Lipcon
        7. hdfs-1073-merge.patch
          734 kB
          Todd Lipcon
        8. hdfs-1073-merge.patch
          740 kB
          Todd Lipcon
        9. hdfs-1073-merge.patch
          738 kB
          Todd Lipcon

        Issue Links

        1.
        Refactor edit log loading to a separate class from edit log writing Sub-task Closed Todd Lipcon Actions
        2.
        Refactor storage management into separate classes than fsimage file reading/writing Sub-task Closed Todd Lipcon Actions
        3.
        Remove intentionally corrupt 0.13 directory layout creation Sub-task Closed Todd Lipcon Actions
        4.
        Persist transaction ID on disk between NN restarts Sub-task Resolved Todd Lipcon Actions
        5.
        Refactor more startup and image loading code out of FSImage Sub-task Resolved Todd Lipcon Actions
        6.
        Add code to detect valid length of an edits file Sub-task Resolved Todd Lipcon Actions
        7.
        Add code to inspect a storage directory with txid-based filenames Sub-task Resolved Todd Lipcon Actions
        8.
        Add code to list which edit logs are available on a remote NN Sub-task Resolved Todd Lipcon Actions
        9.
        Refactor log rolling and filename management out of FSEditLog Sub-task Resolved Todd Lipcon Actions
        10.
        reduce need to rewrite fsimage on statrtup Sub-task Resolved Todd Lipcon Actions
        11.
        Extend image checksumming to function with multiple fsimage files Sub-task Resolved Todd Lipcon Actions
        12.
        Remove use of timestamps to identify checkpoints and logs Sub-task Resolved Todd Lipcon Actions
        13.
        Add migration tests from old-format to new-format storage Sub-task Resolved Unassigned Actions
        14.
        Add state management variables to FSEditLog Sub-task Resolved Todd Lipcon Actions
        15.
        Add some convenience functions to iterate over edit log streams Sub-task Resolved Todd Lipcon Actions
        16.
        Update HDFS-1073 branch to deal with OP_INVALID-filled preallocation Sub-task Resolved Todd Lipcon Actions
        17.
        Change edit logs and images to be named based on txid Sub-task Resolved Todd Lipcon Actions
        18.
        Add constants for LAYOUT_VERSIONs in edits log branch Sub-task Resolved Todd Lipcon Actions
        19.
        Additional QA tasks for Edit Log branch Sub-task Resolved Todd Lipcon Actions
        20.
        Remove references to StorageDirectory from JournalManager interface Sub-task Resolved Ivan Kelly Actions
        21.
        TestDFSUpgrade failing in HDFS-1073 branch Sub-task Resolved Todd Lipcon Actions
        22.
        HDFS-1073: Fix backupnode for new edits/image layout Sub-task Resolved Todd Lipcon Actions
        23.
        1073: Enable multiple checkpointers to run simultaneously Sub-task Resolved Todd Lipcon Actions
        24.
        HDFS-1073: Cleanup in image transfer servlet Sub-task Resolved Todd Lipcon Actions
        25.
        HDFS-1073: Test for 2NN downloading image is not running Sub-task Resolved Todd Lipcon Actions
        26.
        HDFS-1073: Some refactoring of 2NN to easier share code with BN and CN Sub-task Resolved Todd Lipcon Actions
        27.
        Remove vestiges of NNStorageListener Sub-task Resolved Todd Lipcon Actions
        28.
        TestCheckpoint needs to clean up between cases Sub-task Resolved Todd Lipcon Actions
        29.
        Fix race conditions when running two rapidly checkpointing 2NNs Sub-task Resolved Todd Lipcon Actions
        30.
        Image transfer process misreports client side exceptions Sub-task Resolved Todd Lipcon Actions
        31.
        HDFS-1073: Kill previous.checkpoint, lastcheckpoint.tmp directories Sub-task Resolved Todd Lipcon Actions
        32.
        Clean up and test behavior under failed edit streams Sub-task Resolved Aaron Myers Actions
        33.
        1073: Remove checkpointTxId from VERSION file Sub-task Resolved Todd Lipcon Actions
        34.
        1073: remove/archive unneeded/old storage files Sub-task Resolved Todd Lipcon Actions
        35.
        1073: 2NN needs to handle case of reformatted NN better Sub-task Resolved Todd Lipcon Actions
        36.
        1073: Image inspector should return finalized logs before unfinalized logs Sub-task Resolved Todd Lipcon Actions
        37.
        1073: Improve TestNamespace and TestEditLog in 1073 branch Sub-task Resolved Todd Lipcon Actions
        38.
        1073: Improve upgrade tests from 0.22 Sub-task Resolved Todd Lipcon Actions
        39.
        1073: determine edit log validity by truly reading and validating transactions Sub-task Resolved Todd Lipcon Actions
        40.
        1073: address checkpoint upload when one of the storage dirs is failed Sub-task Resolved Todd Lipcon Actions
        41.
        1073: NN should not clear storage directory when restoring removed storage Sub-task Resolved Todd Lipcon Actions
        42.
        1073: create an escape hatch to ignore startup consistency problems Sub-task Resolved Colin McCabe Actions
        43.
        1073: finalize inprogress edit logs at startup Sub-task Resolved Todd Lipcon Actions
        44.
        1073: Move edits log archiving logic into FSEditLog/JournalManager Sub-task Resolved Todd Lipcon Actions
        45.
        1073: Handle case where an entirely empty log is left during NN crash Sub-task Resolved Todd Lipcon Actions
        46.
        1073: consider adding END_LOG_SEGMENT txn when finalizing inprogress logs at startup Sub-task Open Unassigned Actions
        47.
        1073: update remaining unit tests to new storage filenames Sub-task Resolved Todd Lipcon Actions
        48.
        1073: Add a flag to 2NN to format its checkpoint dirs on startup Sub-task Resolved Todd Lipcon Actions
        49.
        1073: Checkpoint interval should be based on txn count, not size Sub-task Resolved Todd Lipcon Actions
        50.
        1073: address remaining TODOs and pre-merge cleanup Sub-task Resolved Todd Lipcon Actions
        51.
        1073: fix regression of HDFS-1955 in branch Sub-task Resolved Todd Lipcon Actions
        52.
        1073: Fault injection for StorageDirectory failures during read/write of FSImage/Edits files Sub-task Open Unassigned Actions
        53.
        1073: Zero pad edits filename to make them lexically sortable Sub-task Resolved Ivan Kelly Actions
        54.
        1073: Move all journal stream management code into one place Sub-task Closed Ivan Kelly Actions
        55.
        1073: fix CreateEditsLog test tool in branch Sub-task Resolved Todd Lipcon Actions
        56.
        1073: Reenable TestEditLog.testFailedOpen and fix exposed bug Sub-task Resolved Todd Lipcon Actions
        57.
        1073: clean up TestCheckpoint and remove TODOs Sub-task Resolved Todd Lipcon Actions
        58.
        1073: Address remaining TODOs Sub-task Resolved Todd Lipcon Actions
        59.
        1073: address findbugs/javadoc warnings Sub-task Resolved Todd Lipcon Actions
        60.
        saveNamespace should not throw IOE when only one storage directory fails to write VERSION file Sub-task Resolved Andras Bokor Actions
        61.
        Complete decoupling of failure states between edits and image dirs Sub-task Open Unassigned Actions

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            tlipcon Todd Lipcon
            sanjay.radia Sanjay Radia
            Votes:
            0 Vote for this issue
            Watchers:
            49 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment