Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • HA branch (HDFS-1623)
    • None
    • documentation, ha
    • None

    Description

      We've added a few configs, like shared edits dir, dfs.ha.namenodes, etc - we should probably add these to hdfs-default.xml so they get documented.

      Attachments

        1. hdfs-2819.txt
          18 kB
          Eli Collins
        2. hdfs-2819.txt
          17 kB
          Eli Collins
        3. hdfs-2819.txt
          5 kB
          Eli Collins
        4. hdfs-2819-ammend.txt
          4 kB
          Eli Collins

        Issue Links

        Activity

          harip Hari Mankude added a comment -

          This is a very good idea. Please document the information.

          harip Hari Mankude added a comment - This is a very good idea. Please document the information.
          eli2 Eli Collins added a comment -

          Patch attached.

          • Removes "namenode" from common keys for consistency
          • Does not include dfs.ha.allow.stale.reads, whether the StandbyNode allows read operations, since this is currently only used for testing purposes.

          The configuration keys w/o defaults will be documented in HDFS-2733. Here's the list for the curious. Please chime in on HDFS-2885 if you agree we should remove "federation" from the nameservice configs below since they may be usedoutside federation.

          dfs.ha.fencing.methods - List of fencing methods to use for service fencing. May contain builtin methods (eg shell and sshfence) or user-defined method.

          dfs.ha.fencing.ssh.private-key-files - The SSH private key files to use with the builtin sshfence fencer.

          dfs.federation.nameservices - The list of nameservices.

          dfs.federation.nameservice.id - The ID of this nameservice. If the nameservice ID is not configured it is determined automatically by matching the local node's address with the configured address.

          dfs.ha.namenodes - The prefix for a given nameservice, lists namenodes
          for a given nameservice.

          dfs.ha.namenode.id - The ID of this namenode. If the namenode ID is not configured it is determined automatically by matching the local node's address with the configured address.

          eli2 Eli Collins added a comment - Patch attached. Removes "namenode" from common keys for consistency Does not include dfs.ha.allow.stale.reads , whether the StandbyNode allows read operations, since this is currently only used for testing purposes. The configuration keys w/o defaults will be documented in HDFS-2733 . Here's the list for the curious. Please chime in on HDFS-2885 if you agree we should remove "federation" from the nameservice configs below since they may be usedoutside federation. dfs.ha.fencing.methods - List of fencing methods to use for service fencing. May contain builtin methods (eg shell and sshfence) or user-defined method. dfs.ha.fencing.ssh.private-key-files - The SSH private key files to use with the builtin sshfence fencer. dfs.federation.nameservices - The list of nameservices. dfs.federation.nameservice.id - The ID of this nameservice. If the nameservice ID is not configured it is determined automatically by matching the local node's address with the configured address. dfs.ha.namenodes - The prefix for a given nameservice, lists namenodes for a given nameservice. dfs.ha.namenode.id - The ID of this namenode. If the namenode ID is not configured it is determined automatically by matching the local node's address with the configured address.

          Hi Eli,
          Thanks a lot for the documenting all the configurations.

          Here are the list of HA related configurations i came across.

            public static final String  DFS_CLIENT_FAILOVER_PROXY_PROVIDER_KEY_PREFIX = "dfs.client.failover.proxy.provider";
            public static final String  DFS_CLIENT_FAILOVER_MAX_ATTEMPTS_KEY = "dfs.client.failover.max.attempts";
            public static final int     DFS_CLIENT_FAILOVER_MAX_ATTEMPTS_DEFAULT = 15;
            public static final String  DFS_CLIENT_FAILOVER_SLEEPTIME_BASE_KEY = "dfs.client.failover.sleep.base.millis";
            public static final int     DFS_CLIENT_FAILOVER_SLEEPTIME_BASE_DEFAULT = 500;
            public static final String  DFS_CLIENT_FAILOVER_SLEEPTIME_MAX_KEY = "dfs.client.failover.sleep.max.millis";
            public static final int     DFS_CLIENT_FAILOVER_SLEEPTIME_MAX_DEFAULT = 15000;
            public static final String  DFS_CLIENT_FAILOVER_CONNECTION_RETRIES_KEY = "dfs.client.failover.connection.retries";
            public static final int     DFS_CLIENT_FAILOVER_CONNECTION_RETRIES_DEFAULT = 0;
            public static final String  DFS_CLIENT_FAILOVER_CONNECTION_RETRIES_ON_SOCKET_TIMEOUTS_KEY = "dfs.client.failover.connection.retries.on.timeouts";
            public static final int     DFS_CLIENT_FAILOVER_CONNECTION_RETRIES_ON_SOCKET_TIMEOUTS_DEFAULT = 0;
          

          BTW, can we move above configuration keys to under the comment '// HA related configuration' in DFSConfigKeys class?.

          umamaheswararao Uma Maheswara Rao G added a comment - Hi Eli, Thanks a lot for the documenting all the configurations. Here are the list of HA related configurations i came across. public static final String DFS_CLIENT_FAILOVER_PROXY_PROVIDER_KEY_PREFIX = "dfs.client.failover.proxy.provider" ; public static final String DFS_CLIENT_FAILOVER_MAX_ATTEMPTS_KEY = "dfs.client.failover.max.attempts" ; public static final int DFS_CLIENT_FAILOVER_MAX_ATTEMPTS_DEFAULT = 15; public static final String DFS_CLIENT_FAILOVER_SLEEPTIME_BASE_KEY = "dfs.client.failover.sleep.base.millis" ; public static final int DFS_CLIENT_FAILOVER_SLEEPTIME_BASE_DEFAULT = 500; public static final String DFS_CLIENT_FAILOVER_SLEEPTIME_MAX_KEY = "dfs.client.failover.sleep.max.millis" ; public static final int DFS_CLIENT_FAILOVER_SLEEPTIME_MAX_DEFAULT = 15000; public static final String DFS_CLIENT_FAILOVER_CONNECTION_RETRIES_KEY = "dfs.client.failover.connection.retries" ; public static final int DFS_CLIENT_FAILOVER_CONNECTION_RETRIES_DEFAULT = 0; public static final String DFS_CLIENT_FAILOVER_CONNECTION_RETRIES_ON_SOCKET_TIMEOUTS_KEY = "dfs.client.failover.connection.retries.on.timeouts" ; public static final int DFS_CLIENT_FAILOVER_CONNECTION_RETRIES_ON_SOCKET_TIMEOUTS_DEFAULT = 0; BTW, can we move above configuration keys to under the comment '// HA related configuration' in DFSConfigKeys class?.
          eli2 Eli Collins added a comment -

          Thanks for the feedback Uma. Updated patch.

          • Added _PREFIX to DFS_HA_NAMENODES_KEY since it's a prefix to be consistent with other key names
          • Added "HA related configuration" comment to the section of client HA related keys
          • Added values for dfs.client.failover options. I marked them expert only as users should not have to modify these values.
          eli2 Eli Collins added a comment - Thanks for the feedback Uma. Updated patch. Added _PREFIX to DFS_HA_NAMENODES_KEY since it's a prefix to be consistent with other key names Added "HA related configuration" comment to the section of client HA related keys Added values for dfs.client.failover options. I marked them expert only as users should not have to modify these values.
          tlipcon Todd Lipcon added a comment -
          +    often the logs are rolled. Note that failover triggers a log roll
          +    so the StandbyNode quickly becomes up-to-date.
          

          I think better to say "so the StandbyNode will always be up to date before it becomes Active>'


          • I think dfs.ha.standby.checkpoints can be left undocumented - I only added it for testing purposes, but I can't think of any reason why you wouldn't want it (it's currently the only way to do checkpoints in HA!)
          • I don't follow why dfs.ha.fencing.methods, dfs.ha.fencing.ssh.private-key-files, dfs.federation.nameservices, etc aren't documented in -default. It's OK to have them there with a blank <value></value>. I agree that the ones that are just prefixes (eg dfs.ha.namenodes) need to be documented elsewhere, unless you put them there as "dfs.ha.namenodes.EXAMPLENAMESERVICE".
          tlipcon Todd Lipcon added a comment - + often the logs are rolled. Note that failover triggers a log roll + so the StandbyNode quickly becomes up-to-date. I think better to say "so the StandbyNode will always be up to date before it becomes Active>' I think dfs.ha.standby.checkpoints can be left undocumented - I only added it for testing purposes, but I can't think of any reason why you wouldn't want it (it's currently the only way to do checkpoints in HA!) I don't follow why dfs.ha.fencing.methods, dfs.ha.fencing.ssh.private-key-files, dfs.federation.nameservices, etc aren't documented in -default. It's OK to have them there with a blank <value></value>. I agree that the ones that are just prefixes (eg dfs.ha.namenodes) need to be documented elsewhere, unless you put them there as "dfs.ha.namenodes.EXAMPLENAMESERVICE".
          eli2 Eli Collins added a comment -

          Updated patch.

          #1 Done
          #2 Removed it. I figured it would be used if/when we enable 2NN for HA NNs, which is not likely to happen so agree it can be removed.
          #3 Added them. I assumed the full config documentation was going to be covered by HDFS-2819 but no harm putting them here as well.

          eli2 Eli Collins added a comment - Updated patch. #1 Done #2 Removed it. I figured it would be used if/when we enable 2NN for HA NNs, which is not likely to happen so agree it can be removed. #3 Added them. I assumed the full config documentation was going to be covered by HDFS-2819 but no harm putting them here as well.
          eli2 Eli Collins added a comment -

          Oops, meant HDFS-2733 in that last comment.

          eli2 Eli Collins added a comment - Oops, meant HDFS-2733 in that last comment.
          tlipcon Todd Lipcon added a comment -

          +1

          tlipcon Todd Lipcon added a comment - +1
          eli2 Eli Collins added a comment -

          Thanks for the reviews Uma and Todd. I've committed this.

          eli2 Eli Collins added a comment - Thanks for the reviews Uma and Todd. I've committed this.
          hudson Hudson added a comment -

          Integrated in Hadoop-Hdfs-HAbranch-build #70 (See https://builds.apache.org/job/Hadoop-Hdfs-HAbranch-build/70/)
          HDFS-2819. Document new HA-related configs in hdfs-default.xml. Contributed by Eli Collins

          eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1240914
          Files :

          • /hadoop/common/branches/HDFS-1623/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/NodeFencer.java
          • /hadoop/common/branches/HDFS-1623/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/SshFenceByTcpPort.java
          • /hadoop/common/branches/HDFS-1623/hadoop-common-project/hadoop-common/src/main/resources/core-default.xml
          • /hadoop/common/branches/HDFS-1623/hadoop-hdfs-project/hadoop-hdfs/CHANGES.HDFS-1623.txt
          • /hadoop/common/branches/HDFS-1623/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
          • /hadoop/common/branches/HDFS-1623/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSUtil.java
          • /hadoop/common/branches/HDFS-1623/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/HAUtil.java
          • /hadoop/common/branches/HDFS-1623/hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
          • /hadoop/common/branches/HDFS-1623/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/MiniDFSCluster.java
          • /hadoop/common/branches/HDFS-1623/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSUtil.java
          • /hadoop/common/branches/HDFS-1623/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/HATestUtil.java
          • /hadoop/common/branches/HDFS-1623/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestHAConfiguration.java
          • /hadoop/common/branches/HDFS-1623/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/tools/TestDFSHAAdmin.java
          hudson Hudson added a comment - Integrated in Hadoop-Hdfs-HAbranch-build #70 (See https://builds.apache.org/job/Hadoop-Hdfs-HAbranch-build/70/ ) HDFS-2819 . Document new HA-related configs in hdfs-default.xml. Contributed by Eli Collins eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1240914 Files : /hadoop/common/branches/ HDFS-1623 /hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/NodeFencer.java /hadoop/common/branches/ HDFS-1623 /hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/SshFenceByTcpPort.java /hadoop/common/branches/ HDFS-1623 /hadoop-common-project/hadoop-common/src/main/resources/core-default.xml /hadoop/common/branches/ HDFS-1623 /hadoop-hdfs-project/hadoop-hdfs/CHANGES. HDFS-1623 .txt /hadoop/common/branches/ HDFS-1623 /hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java /hadoop/common/branches/ HDFS-1623 /hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSUtil.java /hadoop/common/branches/ HDFS-1623 /hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/HAUtil.java /hadoop/common/branches/ HDFS-1623 /hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml /hadoop/common/branches/ HDFS-1623 /hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/MiniDFSCluster.java /hadoop/common/branches/ HDFS-1623 /hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSUtil.java /hadoop/common/branches/ HDFS-1623 /hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/HATestUtil.java /hadoop/common/branches/ HDFS-1623 /hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestHAConfiguration.java /hadoop/common/branches/ HDFS-1623 /hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/tools/TestDFSHAAdmin.java

          Eli, thanks for documenting the config.

          Some comments:

          1. Why are we changing the name to DFS_HA_NAMENODES_KEY_PREFIX?
          2. I am not sure what you mean by "prefix for given nameservice". Also please mention value contains list of comma-separated namenodes.
            +<property>
            +  <name>dfs.ha.namenodes.EXAMPLENAMESERVICE</name>
            +  <value></value>
            +  <description>
            +    The prefix for a given nameservice, contains a comma-separated
            +    list of namenodes for a given nameservice (eg EXAMPLENAMESERVICE).
            +  </description>
            +</property>
            
          3. Does the existing code that finds nameservice ID work with this config added? Is this not equivalent to adding an empty nameservice ID? Not sure how our config behaves for <value></value>.
            +<property>
            +  <name>dfs.federation.nameservice.id</name>
            +  <value></value>
            +  <description>
            +    The ID of this nameservice. If the nameservice ID is not
            +    configured or more than one nameservice is configured for
            +    dfs.federation.nameservices it is determined automatically by
            +    matching the local node's address with the configured address.
            +  </description>
            +</property>
            
          4. Please comment out the properties dfs.federation.nameservices and dfs.federation.nameservice.id.
          5. Can you please describe what 0 means for users? Same for dfs.client.failover.connection.retries. Users many not understand what failover IPC client means.
            +<property>
            +  <name>dfs.client.failover.connection.retries.on.timeouts</name>
            +  <value>0</value>
            +  <description>
            +    Expert only. The number of retry attempts a failover IPC client
            +    will make on socket timeout when establishing a server connection.
            +  </description>
            +</property>
            
          6. For client.failover, are the default retry attempts, timeouts correct? I am sure I understand the rationale for these timeouts. The failover attempts are made for ~ 165 seconds in following sequence ~(0.5 1 2 4 8 15 15 15 15 15 15 15 15 15 15)
          7. From config perspective client.failover config items are confusing. There configs related to failover.max.attempts and failover.connection.retries. Not sure if the description helps understand the difference.
          sureshms Suresh Srinivas added a comment - Eli, thanks for documenting the config. Some comments: Why are we changing the name to DFS_HA_NAMENODES_KEY_PREFIX? I am not sure what you mean by "prefix for given nameservice". Also please mention value contains list of comma-separated namenodes. +<property> + <name>dfs.ha.namenodes.EXAMPLENAMESERVICE</name> + <value></value> + <description> + The prefix for a given nameservice, contains a comma-separated + list of namenodes for a given nameservice (eg EXAMPLENAMESERVICE). + </description> +</property> Does the existing code that finds nameservice ID work with this config added? Is this not equivalent to adding an empty nameservice ID? Not sure how our config behaves for <value></value>. +<property> + <name>dfs.federation.nameservice.id</name> + <value></value> + <description> + The ID of this nameservice. If the nameservice ID is not + configured or more than one nameservice is configured for + dfs.federation.nameservices it is determined automatically by + matching the local node's address with the configured address. + </description> +</property> Please comment out the properties dfs.federation.nameservices and dfs.federation.nameservice.id. Can you please describe what 0 means for users? Same for dfs.client.failover.connection.retries. Users many not understand what failover IPC client means. +<property> + <name>dfs.client.failover.connection.retries.on.timeouts</name> + <value>0</value> + <description> + Expert only. The number of retry attempts a failover IPC client + will make on socket timeout when establishing a server connection. + </description> +</property> For client.failover, are the default retry attempts, timeouts correct? I am sure I understand the rationale for these timeouts. The failover attempts are made for ~ 165 seconds in following sequence ~(0.5 1 2 4 8 15 15 15 15 15 15 15 15 15 15) From config perspective client.failover config items are confusing. There configs related to failover.max.attempts and failover.connection.retries. Not sure if the description helps understand the difference.
          eli2 Eli Collins added a comment -

          Thanks for the review Suresh. Comments below and updated patch attached (hdfs-2819-ammend.txt)

          #1 Because it is the prefix for a key rather than a key itself (ie you can't use it by itself to lookup anything). This prefix plus a suffix (namespace ID) will result in a key that refers to a set of namesnodes. The naming is consistent with other variables that use _PREFIX.
          #2 "dfs.ha.namenodes" is the prefix for a given namservice, eg "dfs.ha.namenodes.EXAMPLENAMESERVICE". This description already says "contains a comma-separated list of namenodes", maybe you were thinking of another key?
          #3 Yes, empty values are parsed as null. Note that a value with whitespace is not, ie "<value> </value>" here would not be kosher.
          #4 I added them per Todd's request above, disagree w his thinking?
          #5 These values are used to set "ipc.client.connect.max.retries" and "ipc.client.connect.max.retries.on.timeouts" respectively for the failover rpc proxy. I updated the description with the rationale for the 0 default (failover effectively means the clients do retry). These are marked "Expert only" because we don't expect most users to modify them or need to understand them.
          #6 The base time is 500ms and we don't wait on the first retry so the sequence is 0, 1s, 2s, 4s, 8s, .. (up to 15 retries, the last base value caps at 8s, though note that the 5th to 15th values, like the others, will vary by +/- 50% each time, so could delay up to 12s). Make sense?
          #7 Not sure I follow, do you have a specific suggestion? I marked these as "Expert only" because we don't expect most users to modify or need to understand them.

          eli2 Eli Collins added a comment - Thanks for the review Suresh. Comments below and updated patch attached (hdfs-2819-ammend.txt) #1 Because it is the prefix for a key rather than a key itself (ie you can't use it by itself to lookup anything). This prefix plus a suffix (namespace ID) will result in a key that refers to a set of namesnodes. The naming is consistent with other variables that use _PREFIX. #2 "dfs.ha.namenodes" is the prefix for a given namservice, eg "dfs.ha.namenodes.EXAMPLENAMESERVICE". This description already says "contains a comma-separated list of namenodes", maybe you were thinking of another key? #3 Yes, empty values are parsed as null. Note that a value with whitespace is not, ie "<value> </value>" here would not be kosher. #4 I added them per Todd's request above, disagree w his thinking? #5 These values are used to set "ipc.client.connect.max.retries" and "ipc.client.connect.max.retries.on.timeouts" respectively for the failover rpc proxy. I updated the description with the rationale for the 0 default (failover effectively means the clients do retry). These are marked "Expert only" because we don't expect most users to modify them or need to understand them. #6 The base time is 500ms and we don't wait on the first retry so the sequence is 0, 1s, 2s, 4s, 8s, .. (up to 15 retries, the last base value caps at 8s, though note that the 5th to 15th values, like the others, will vary by +/- 50% each time, so could delay up to 12s). Make sense? #7 Not sure I follow, do you have a specific suggestion? I marked these as "Expert only" because we don't expect most users to modify or need to understand them.

          People

            eli Eli Collins
            tlipcon Todd Lipcon
            Votes:
            0 Vote for this issue
            Watchers:
            Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack