Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-3055

Implement recovery mode for branch-1

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.1.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      This is a new feature. It is documented in hdfs_user_guide.xml.

      Description

      Implement recovery mode for branch-1

      1. HDFS-3055-b1.001.patch
        16 kB
        Colin Patrick McCabe
      2. HDFS-3055-b1.002.patch
        29 kB
        Colin Patrick McCabe
      3. HDFS-3055-b1.003.patch
        25 kB
        Colin Patrick McCabe
      4. HDFS-3055-b1.004.patch
        26 kB
        Colin Patrick McCabe
      5. HDFS-3055-b1.005.patch
        26 kB
        Colin Patrick McCabe
      6. HDFS-3055-b1.006.patch
        26 kB
        Colin Patrick McCabe
      7. HDFS-3055-b1.007.patch
        26 kB
        Colin Patrick McCabe

        Issue Links

          Activity

          Hide
          Colin Patrick McCabe added a comment -
          • initial version
          Show
          Colin Patrick McCabe added a comment - initial version
          Hide
          Colin Patrick McCabe added a comment -
          • add unit test
          • some fixes to NN unclean shutdown (to allow unit test to work)
          • better error reporting for the branch-1 edit log stuff (print out the offset when we encounter a problem)
          Show
          Colin Patrick McCabe added a comment - add unit test some fixes to NN unclean shutdown (to allow unit test to work) better error reporting for the branch-1 edit log stuff (print out the offset when we encounter a problem)
          Hide
          Todd Lipcon added a comment -

          Hi Colin. Is this patch up to date with respect to the trunk version? Have you run the unit tests for branch-1? I'll review it, but want to make sure there aren't any changes in flight.

          Show
          Todd Lipcon added a comment - Hi Colin. Is this patch up to date with respect to the trunk version? Have you run the unit tests for branch-1? I'll review it, but want to make sure there aren't any changes in flight.
          Hide
          Colin Patrick McCabe added a comment -
          • rebase on branch-1

          todd: yes, this is up to date, and I've run the following tests:
          TestCheckpoint,
          TestEditLog,
          TestNameNodeRecovery,
          TestEditLogLoading,
          TestNameNodeMXBean,
          TestSaveNamespace,
          TestSecurityTokenEditLog,
          TestStorageDirectoryFailure,
          TestStorageRestore

          Show
          Colin Patrick McCabe added a comment - rebase on branch-1 todd: yes, this is up to date, and I've run the following tests: TestCheckpoint, TestEditLog, TestNameNodeRecovery, TestEditLogLoading, TestNameNodeMXBean, TestSaveNamespace, TestSecurityTokenEditLog, TestStorageDirectoryFailure, TestStorageRestore
          Hide
          Tsz Wo Nicholas Sze added a comment -

          For the same feature, we usually use the same JIRA for different branches. It is okay that you already created two (HDFS-3004 and this). Please try to get HDFS-3004 to trunk first.

          Show
          Tsz Wo Nicholas Sze added a comment - For the same feature, we usually use the same JIRA for different branches. It is okay that you already created two ( HDFS-3004 and this). Please try to get HDFS-3004 to trunk first.
          Hide
          Colin Patrick McCabe added a comment -
          • update patch to reflect comments from HDFS-3004
          Show
          Colin Patrick McCabe added a comment - update patch to reflect comments from HDFS-3004
          Hide
          Colin Patrick McCabe added a comment -
          • move askOperator to MetaRecoveryContext::editLogLoaderPrompt
          • remove unecessary toString() call
          • warn about "losing data from your HDFS filesystem" rather than "losing data from your filesystem"
          Show
          Colin Patrick McCabe added a comment - move askOperator to MetaRecoveryContext::editLogLoaderPrompt remove unecessary toString() call warn about "losing data from your HDFS filesystem" rather than "losing data from your filesystem"
          Hide
          Todd Lipcon added a comment -
          • can you explain the changes in FSNamesystem.java?
          • Can you update the logging in the test cases to use StringUtils.stringifyException to match trunk?
          • Did you run all the existing tests in branch-1? The one difference that I can see that might cause a failure is that the IOException thrown during a failed startup used to retain the exception t as its cause, but no longer does.

          Otherwise looks good.

          Show
          Todd Lipcon added a comment - can you explain the changes in FSNamesystem.java? Can you update the logging in the test cases to use StringUtils.stringifyException to match trunk? Did you run all the existing tests in branch-1? The one difference that I can see that might cause a failure is that the IOException thrown during a failed startup used to retain the exception t as its cause, but no longer does. Otherwise looks good.
          Hide
          Colin Patrick McCabe added a comment -

          > can you explain the changes in FSNamesystem.java?

          That change fixes error handling in FSNamesystem. Previously, we did not call FSNamesystem::shutdown() when initialization failed. This led to the MBeans staying registered. This is irrelevant when running the NameNode normally, since the MBeans are destroyed when the entire process goes away. However, when run from a test context, the next attempt to create a MiniDFSCluster instance will fail with "port in use" or some such error.

          > Can you update the logging in the test cases to use StringUtils.stringifyException to match trunk?

          Ok.

          > Did you run all the existing tests in branch-1?

          I ran these tests:
          TestCheckpoint,
          TestEditLog,
          TestNameNodeRecovery,
          TestEditLogLoading,
          TestNameNodeMXBean,
          TestSaveNamespace,
          TestSecurityTokenEditLog,
          TestStorageDirectoryFailure,
          TestStorageRestore

          Show
          Colin Patrick McCabe added a comment - > can you explain the changes in FSNamesystem.java? That change fixes error handling in FSNamesystem. Previously, we did not call FSNamesystem::shutdown() when initialization failed. This led to the MBeans staying registered. This is irrelevant when running the NameNode normally, since the MBeans are destroyed when the entire process goes away. However, when run from a test context, the next attempt to create a MiniDFSCluster instance will fail with "port in use" or some such error. > Can you update the logging in the test cases to use StringUtils.stringifyException to match trunk? Ok. > Did you run all the existing tests in branch-1? I ran these tests: TestCheckpoint, TestEditLog, TestNameNodeRecovery, TestEditLogLoading, TestNameNodeMXBean, TestSaveNamespace, TestSecurityTokenEditLog, TestStorageDirectoryFailure, TestStorageRestore
          Hide
          Colin Patrick McCabe added a comment -
          • TestNameNodeRecovery: use StringUtils instead of StringWriter to serialize exception
          Show
          Colin Patrick McCabe added a comment - TestNameNodeRecovery: use StringUtils instead of StringWriter to serialize exception
          Hide
          Todd Lipcon added a comment -

          OK. +1, patch looks good. Please run all the branch-1 unit tests so we don't introduce any other failures - should be OK but best to be safe on the stable branch. When you report back, I'll commit.

          Show
          Todd Lipcon added a comment - OK. +1, patch looks good. Please run all the branch-1 unit tests so we don't introduce any other failures - should be OK but best to be safe on the stable branch. When you report back, I'll commit.
          Hide
          Colin Patrick McCabe added a comment -

          In the docs, refer to -force, not --force

          Show
          Colin Patrick McCabe added a comment - In the docs, refer to -force, not --force
          Hide
          Colin Patrick McCabe added a comment -

          I reran all the unit tests for branch-1 last night and could trace no failures to this change. Should be good to go.

          Show
          Colin Patrick McCabe added a comment - I reran all the unit tests for branch-1 last night and could trace no failures to this change. Should be good to go.
          Hide
          Todd Lipcon added a comment -

          Committed to branch-1. Thanks, Colin. Can you please fill in the "Release Note" flag here and in HDFS-3004 pointing out the new feature and giving a reference to where it is documented?

          Show
          Todd Lipcon added a comment - Committed to branch-1. Thanks, Colin. Can you please fill in the "Release Note" flag here and in HDFS-3004 pointing out the new feature and giving a reference to where it is documented?
          Hide
          Matt Foley added a comment -

          Closed upon release of Hadoop-1.1.0.

          Show
          Matt Foley added a comment - Closed upon release of Hadoop-1.1.0.

            People

            • Assignee:
              Colin Patrick McCabe
              Reporter:
              Colin Patrick McCabe
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development