Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-8693

refreshNamenodes does not support adding a new standby to a running DN

    XMLWordPrintableJSON

Details

    • Reviewed

    Description

      I tried to run the following command on a Hadoop 2.6.0 cluster with HA support

      $ hdfs dfsadmin -refreshNamenodes datanode-host:port

      to refresh name nodes on data nodes after I replaced one name node with a new one so that I don't need to restart the data nodes. However, I got the following error:

      refreshNamenodes: HA does not currently support adding a new standby to a running DN. Please do a rolling restart of DNs to reconfigure the list of NNs.

      I checked the 2.6.0 code and the error was thrown by the following code snippet, which led me to this JIRA.

      void refreshNNList(ArrayList<InetSocketAddress> addrs) throws IOException

      { Set<InetSocketAddress> oldAddrs = Sets.newHashSet(); for (BPServiceActor actor : bpServices) { oldAddrs.add(actor.getNNSocketAddress()); }

      Set<InetSocketAddress> newAddrs = Sets.newHashSet(addrs);
      if (!Sets.symmetricDifference(oldAddrs, newAddrs).isEmpty())

      { // Keep things simple for now -- we can implement this at a later date. throw new IOException( "HA does not currently support adding a new standby to a running DN. " + "Please do a rolling restart of DNs to reconfigure the list of NNs."); }

      }

      Looks like this the refreshNameNodes command is an uncompleted feature.

      Unfortunately, the new name node on a replacement is critical for auto provisioning a hadoop cluster with HDFS HA support. Without this support, the HA feature could not really be used. I also observed that the new standby name node on the replacement instance could stuck in safe mode because no data nodes check in with it. Even with a rolling restart, it may take quite some time to restart all data nodes if we have a big cluster, for example, with 4000 data nodes, let alone restarting DN is way too intrusive and it is not a preferable operation in production. It also increases the chance for a double failure because the standby name node is not really ready for a failover in the case that the current active name node fails.

      Attachments

        1. HDFS-8693.02.patch
          5 kB
          Ajith S
        2. HDFS-8693.03.patch
          5 kB
          Ajith S
        3. HDFS-8693.1.patch
          5 kB
          Ajith S
        4. HDFS-8693-03-addendum.patch
          1 kB
          Brahma Reddy Battula
        5. HDFS-8693-03-Addendum-branch-2.patch
          3 kB
          Brahma Reddy Battula

        Issue Links

          Activity

            People

              ajithshetty Ajith S
              john.jian.fang Jian Fang
              Votes:
              1 Vote for this issue
              Watchers:
              16 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: