Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-2918

HA: Update HA docs to cover dfsadmin

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 2.0.0-alpha
    • Fix Version/s: None
    • Component/s: ha
    • Labels:
      None
    • Target Version/s:

      Description

      dfsadmin currently always uses the first namenode rather than failing over. It should failover like other clients, unless fs specifies a specific namenode.

      hadoop-0.24.0-SNAPSHOT $ ./bin/hdfs haadmin -failover nn1 nn2
      Failover from nn1 to nn2 successful
      # nn2 is 8022
      hadoop-0.24.0-SNAPSHOT $ ./bin/hdfs dfsadmin -fs localhost:8022 -safemode enter
      Safe mode is ON
      hadoop-0.24.0-SNAPSHOT $ ./bin/hdfs dfsadmin -safemode get 
      Safe mode is OFF
      hadoop-0.24.0-SNAPSHOT $ ./bin/hdfs dfsadmin -fs localhost:8022 -safemode get
      Safe mode is ON
      

        Activity

        Hide
        Aaron T. Myers added a comment -

        Converting to top-level issue with commit of HDFS-1623.

        Show
        Aaron T. Myers added a comment - Converting to top-level issue with commit of HDFS-1623 .
        Hide
        Uma Maheswara Rao G added a comment -

        Ok, thanks Eli.

        Show
        Uma Maheswara Rao G added a comment - Ok, thanks Eli.
        Hide
        Eli Collins added a comment -

        Yup, I filed HDFS-2922 for that with a patch. We need to update the HA docs to cover the behavior of dfsadmin (eg which operations failover, which don't and why, which should be performed only on the active, which are OK to do on the standby). Once we can name the specific namenode IDs in dfsadmin (HDFS-2916) I think it will be more clear / easy for users. Re-purposing this jira to update the docs.

        Show
        Eli Collins added a comment - Yup, I filed HDFS-2922 for that with a patch. We need to update the HA docs to cover the behavior of dfsadmin (eg which operations failover, which don't and why, which should be performed only on the active, which are OK to do on the standby). Once we can name the specific namenode IDs in dfsadmin ( HDFS-2916 ) I think it will be more clear / easy for users. Re-purposing this jira to update the docs.
        Hide
        Uma Maheswara Rao G added a comment -

        Hi Eli,
        I spent some time on this issue. I dont see any problem with DFSAdmin.

        2012-02-09 09:11:29,282 WARN  retry.RetryInvocationHandler (RetryInvocationHandler.java:invoke(106)) - Exception while invoking getStats of class org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB after 0 *fail over attempts. Trying to fail over immediately.*
        Configured Capacity: 190744240128 (177.64 GB)
        Present Capacity: 74658306048 (69.53 GB)
        DFS Remaining: 74658299904 (69.53 GB)
        DFS Used: 6144 (6 KB)
        DFS Used%: 0%
        Under replicated blocks: 0
        Blocks with corrupt replicas: 0
        Missing blocks: 0
        
        -------------------------------------------------
        Datanodes available: 3 (3 total, 0 dead)
        

        The problem with the API categeory from Namenode. Currently safemode API are allowing on standby also. That is tyhe rason it won't do any failover. Otherwise DFSAdmin uses just not FileSystem object to invoke APIs. So, it should be able to do failover normally if we configure HA related configurations.

        Addressing TODO in NameNodeRPCServer should solve this issue.

        // TODO:HA decide on OperationCategory for this

        Show
        Uma Maheswara Rao G added a comment - Hi Eli, I spent some time on this issue. I dont see any problem with DFSAdmin. 2012-02-09 09:11:29,282 WARN retry.RetryInvocationHandler (RetryInvocationHandler.java:invoke(106)) - Exception while invoking getStats of class org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB after 0 *fail over attempts. Trying to fail over immediately.* Configured Capacity: 190744240128 (177.64 GB) Present Capacity: 74658306048 (69.53 GB) DFS Remaining: 74658299904 (69.53 GB) DFS Used: 6144 (6 KB) DFS Used%: 0% Under replicated blocks: 0 Blocks with corrupt replicas: 0 Missing blocks: 0 ------------------------------------------------- Datanodes available: 3 (3 total, 0 dead) The problem with the API categeory from Namenode. Currently safemode API are allowing on standby also. That is tyhe rason it won't do any failover. Otherwise DFSAdmin uses just not FileSystem object to invoke APIs. So, it should be able to do failover normally if we configure HA related configurations. Addressing TODO in NameNodeRPCServer should solve this issue. // TODO:HA decide on OperationCategory for this

          People

          • Assignee:
            Unassigned
            Reporter:
            Eli Collins
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:

              Development