Details

    Description

      HDFS-2231 started the process of adding configuration for HA, but one piece is missing. The current state of the configuration is, I believe:
      dfs.ha.namenodes - a list of identifiers for HA namenodes
      dfs.federation.nameservices - a list of federated nameservices
      dfs.namenode.rpc-address[.nameservice-id][.namenode-id] - some specific config for the given namenode. If HA or federation is disabled, the extra components can be elided for backwards compatibility.

      The issue here is that there is no easy way to discern which NN is paired with which other NN. Additionally, adding a new federated nameservice to a config will require changes to dfs.ha.namenodes which makes templating harder. It would be simpler to change dfs.ha.namenodes to be nameservice-scoped: dfs.ha.namenodes.<nameservice-id>.

      Attachments

        1. hdfs-2582.txt
          47 kB
          Todd Lipcon
        2. hdfs-2582-v2.txt
          49 kB
          Todd Lipcon
        3. hdfs-2582-v3.txt
          49 kB
          Todd Lipcon
        4. hdfs-2582-v4.txt
          49 kB
          Todd Lipcon

        Activity

          tlipcon Todd Lipcon added a comment -

          Example config under the proposed scheme, for a DN which is part of two federated namespaces, each of which is HA-enabled:

          dfs.federation.nameservices=namespace1,namespace2
          dfs.ha.namenodes.namespace1=ns1-nn1,ns1-nn2
          dfs.ha.namenodes.namespace2=ns2-nn1,ns2-nn2
          dfs.namenode.http-address.namespace1.ns1-nn1=node1.example.com:50070
          

          Not entirely clear still: do namenode IDs need to be unique across federated nameservices? If they do, then why do we need the namespace1 component in the dfs.namenode.http-address... config?

          tlipcon Todd Lipcon added a comment - Example config under the proposed scheme, for a DN which is part of two federated namespaces, each of which is HA-enabled: dfs.federation.nameservices=namespace1,namespace2 dfs.ha.namenodes.namespace1=ns1-nn1,ns1-nn2 dfs.ha.namenodes.namespace2=ns2-nn1,ns2-nn2 dfs.namenode.http-address.namespace1.ns1-nn1=node1.example.com:50070 Not entirely clear still: do namenode IDs need to be unique across federated nameservices? If they do, then why do we need the namespace1 component in the dfs.namenode.http-address... config?
          tlipcon Todd Lipcon added a comment -

          Updated description to clarify terminology ("nameservice" instead of "namespace")

          tlipcon Todd Lipcon added a comment - Updated description to clarify terminology ("nameservice" instead of "namespace")
          eli Eli Collins added a comment -

          You're proposed scheme makes sense to me. Perhaps the namespace was left in the config so each namespace could have its own web UI at some point?

          eli Eli Collins added a comment - You're proposed scheme makes sense to me. Perhaps the namespace was left in the config so each namespace could have its own web UI at some point?
          tlipcon Todd Lipcon added a comment -

          Here's a patch implementing the proposal.

          There's one TODO in there for fixing TestGetConf - I'm working on fixing that now.

          I haven't run all of the unit tests yet, but I did run some of the federation-related tests as well as some of the HA-related tests and they pass. I also ran a selection of general HDFS tests and they pass.

          The patch does introduce some more TODOs in places like the balancer, where we are currently doing some custom RPC connections to NNs – will file a followup JIRA for that when this is committed.

          tlipcon Todd Lipcon added a comment - Here's a patch implementing the proposal. There's one TODO in there for fixing TestGetConf - I'm working on fixing that now. I haven't run all of the unit tests yet, but I did run some of the federation-related tests as well as some of the HA-related tests and they pass. I also ran a selection of general HDFS tests and they pass. The patch does introduce some more TODOs in places like the balancer, where we are currently doing some custom RPC connections to NNs – will file a followup JIRA for that when this is committed.
          tlipcon Todd Lipcon added a comment -

          v2 addresses GetConf

          tlipcon Todd Lipcon added a comment - v2 addresses GetConf
          tlipcon Todd Lipcon added a comment -

          v3 addresses a small NPE I discovered building HDFS-2591 on top of this patch.

          tlipcon Todd Lipcon added a comment - v3 addresses a small NPE I discovered building HDFS-2591 on top of this patch.
          eli Eli Collins added a comment -

          lgtm, +1 with the follow-on jiras.

          The code that uses the new map (Map<ns id, Map<nn id, addr>>) is a little unwieldy since we don't have types for the ns and nn ids. Given that this is a tiny map the readability might be worth the overhead of a class (eg in the same vain of ConfiguredNNAddress). Don't feel strongly so feel free to punt if you disagree.

          eli Eli Collins added a comment - lgtm, +1 with the follow-on jiras. The code that uses the new map (Map<ns id, Map<nn id, addr>>) is a little unwieldy since we don't have types for the ns and nn ids. Given that this is a tiny map the readability might be worth the overhead of a class (eg in the same vain of ConfiguredNNAddress). Don't feel strongly so feel free to punt if you disagree.
          tlipcon Todd Lipcon added a comment -

          Here's v4, which I built a few more patches on top. One more slight fix to the nnId detection stuff, nothing major changed.

          I agree the map<map<..>> is kind of clunky, but I'd like to punt to a follow-on to clean this up, since I have some more patches built on top.

          I'll commit this soon based on Eli's prior +1 unless there are any other comments.

          tlipcon Todd Lipcon added a comment - Here's v4, which I built a few more patches on top. One more slight fix to the nnId detection stuff, nothing major changed. I agree the map<map<..>> is kind of clunky, but I'd like to punt to a follow-on to clean this up, since I have some more patches built on top. I'll commit this soon based on Eli's prior +1 unless there are any other comments.
          eli Eli Collins added a comment -

          +1 to v4, the fix in the delta lgtm

          eli Eli Collins added a comment - +1 to v4, the fix in the delta lgtm
          tlipcon Todd Lipcon added a comment -

          Committed to HDFS-1623 branch. Thanks for the reviews, Eli. If there are further review comments I'll be happy to address post-commit.

          tlipcon Todd Lipcon added a comment - Committed to HDFS-1623 branch. Thanks for the reviews, Eli. If there are further review comments I'll be happy to address post-commit.

          People

            tlipcon Todd Lipcon
            tlipcon Todd Lipcon
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: