Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.0.0
    • Fix Version/s: 1.0.2
    • Component/s: namenode
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      When a storage directory is inaccessible, namenode removes it from the valid storage dir list to a removedStorageDirs list. Those storage directories will not be restored when they become healthy again.

      The proposed solution is to restore the previous failed directories at the beginning of checkpointing, say, rollEdits, by copying necessary metadata files from healthy directory to unhealthy ones. In this way, whenever a failed storage directory is recovered by the administrator, he/she can immediately force a checkpointing to restored a failed directory.

      See also HADOOP-4885.

        Issue Links

          Activity

          Transition Time In Source Status Execution Times Last Executer Last Execution Date
          Open Open Resolved Resolved
          17m 10s 1 Eli Collins 11/Mar/12 22:34
          Resolved Resolved Reopened Reopened
          2d 19h 56m 1 Tsz Wo Nicholas Sze 14/Mar/12 18:30
          Reopened Reopened Resolved Resolved
          17m 49s 1 Tsz Wo Nicholas Sze 14/Mar/12 18:48
          Resolved Resolved Closed Closed
          21d 23h 53m 1 Matt Foley 05/Apr/12 19:42
          Matt Foley made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Brandon Li made changes -
          Link This issue is related to HDFS-3127 [ HDFS-3127 ]
          Matt Foley made changes -
          Fix Version/s 1.1.0 [ 12317959 ]
          Suresh Srinivas made changes -
          Fix Version/s 1.0.2 [ 12320051 ]
          Affects Version/s 1.0.0 [ 12318243 ]
          Target Version/s 1.0.2 [ 12320051 ]
          Hide
          Eli Collins added a comment -

          Sorry, posted to the wrong jira!

          Show
          Eli Collins added a comment - Sorry, posted to the wrong jira!
          Tsz Wo Nicholas Sze made changes -
          Affects Version/s 0.24.0 [ 12317653 ]
          Affects Version/s 1.1.0 [ 12317959 ]
          Tsz Wo Nicholas Sze made changes -
          Status Reopened [ 4 ] Resolved [ 5 ]
          Hadoop Flags Reviewed [ 10343 ]
          Fix Version/s 1.1.0 [ 12317959 ]
          Resolution Fixed [ 1 ]
          Hide
          Tsz Wo Nicholas Sze added a comment -

          I have committed this (the patch was posted on HADOOP-4885.) Thanks, Brandon!

          Show
          Tsz Wo Nicholas Sze added a comment - I have committed this (the patch was posted on HADOOP-4885 .) Thanks, Brandon!
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Brandon already has posted a patch on HADOOP-4885. He also has run all the unit tests.

          Jitendra and I have reviewed the patch.

          Show
          Tsz Wo Nicholas Sze added a comment - Brandon already has posted a patch on HADOOP-4885 . He also has run all the unit tests. Jitendra and I have reviewed the patch.
          Tsz Wo Nicholas Sze made changes -
          Link This issue relates to HADOOP-4885 [ HADOOP-4885 ]
          Tsz Wo Nicholas Sze made changes -
          Summary Add mechanism to restore the removed storage directories Backport HADOOP-4885 to branch-1
          Description When a storage directory is inaccessible, namenode removes it from the valid storage dir list to a removedStorageDirs list. Those storage directories will not be restored when they become healthy again.

          The proposed solution is to restore the previous failed directories at the beginning of checkpointing, say, rollEdits, by copying necessary metadata files from healthy directory to unhealthy ones. In this way, whenever a failed storage directory is recovered by the administrator, he/she can immediately force a checkpointing to restored a failed directory.
          When a storage directory is inaccessible, namenode removes it from the valid storage dir list to a removedStorageDirs list. Those storage directories will not be restored when they become healthy again.

          The proposed solution is to restore the previous failed directories at the beginning of checkpointing, say, rollEdits, by copying necessary metadata files from healthy directory to unhealthy ones. In this way, whenever a failed storage directory is recovered by the administrator, he/she can immediately force a checkpointing to restored a failed directory.

          See also HADOOP-4885.
          Tsz Wo Nicholas Sze made changes -
          Resolution Duplicate [ 3 ]
          Status Resolved [ 5 ] Reopened [ 4 ]
          Hide
          Tsz Wo Nicholas Sze added a comment -

          @Uma, you are right that HADOOP-4885 already has fixed this. So this one is a backport. Will revise the title.

          @Eli, this is not a dupe of HDFS-2781.

          Show
          Tsz Wo Nicholas Sze added a comment - @Uma, you are right that HADOOP-4885 already has fixed this. So this one is a backport. Will revise the title. @Eli, this is not a dupe of HDFS-2781 .
          Eli Collins made changes -
          Field Original Value New Value
          Status Open [ 1 ] Resolved [ 5 ]
          Resolution Duplicate [ 3 ]
          Hide
          Eli Collins added a comment -

          This is a dupe of HDFS-2781. Brandon, feel free to post a patch there.

          Show
          Eli Collins added a comment - This is a dupe of HDFS-2781 . Brandon, feel free to post a patch there.
          Hide
          Uma Maheswara Rao G added a comment -

          Hi Brandon,
          It seems to me that you are looking for the same issue(HADOOP-4885) which is already addressed right?
          Also we have the property to enable or disable that feature "dfs.namenode.name.dir.restore".
          Are you talking about some other issue/improvement here?

          Show
          Uma Maheswara Rao G added a comment - Hi Brandon, It seems to me that you are looking for the same issue( HADOOP-4885 ) which is already addressed right? Also we have the property to enable or disable that feature "dfs.namenode.name.dir.restore". Are you talking about some other issue/improvement here?
          Brandon Li created issue -

            People

            • Assignee:
              Brandon Li
              Reporter:
              Brandon Li
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development