Details

      Description

      We should document how to configure, set up, and monitor an automatic failover setup. This will require adding the new configs to the *-default.xml and adding prose to the "apt" docs as well.

      1. HDFSHighAvailability.html
        43 kB
        Todd Lipcon
      2. hdfs-3159.txt
        15 kB
        Todd Lipcon
      3. hdfs-3159.txt
        16 kB
        Todd Lipcon
      4. delta.txt
        6 kB
        Todd Lipcon

        Activity

        Hide
        Todd Lipcon added a comment -

        Attaching patch as well as the rendered HTML file.

        Show
        Todd Lipcon added a comment - Attaching patch as well as the rendered HTML file.
        Hide
        Eli Collins added a comment -

        Looks good Todd, minor comments:

        • Instructions say dfs.ha.automatic-failover.enabled is in core-site.xml, should be hdfs-site.xml
        • "notifying the other machines" / "notifying the other NameNode"
        • "active node" / "active NameNode"
        • "acts as a client of ZooKeeper" -> "is a ZooKeeper client"
        • "has a ZKFC running on the same machine" / "has a ZKFC that runs on the same machine as the NameNode"
        • "special health-check command" / "health-check command"
        • Use "local NameNode" instead of "local node" most places, eg to avoid confusion with node in the ZK sense (eg "lock node")
        • "users may refer" / "refer"
        • "within 10-20 seconds", worth mentioning "ha.zookeeper.session-timeout.ms" instead of a raw value
        Show
        Eli Collins added a comment - Looks good Todd, minor comments: Instructions say dfs.ha.automatic-failover.enabled is in core-site.xml, should be hdfs-site.xml "notifying the other machines" / "notifying the other NameNode" "active node" / "active NameNode" "acts as a client of ZooKeeper" -> "is a ZooKeeper client" "has a ZKFC running on the same machine" / "has a ZKFC that runs on the same machine as the NameNode" "special health-check command" / "health-check command" Use "local NameNode" instead of "local node" most places, eg to avoid confusion with node in the ZK sense (eg "lock node") "users may refer" / "refer" "within 10-20 seconds", worth mentioning "ha.zookeeper.session-timeout.ms" instead of a raw value
        Hide
        Aaron T. Myers added a comment -

        It's not introduced by your change, but you might as well fix it here:

        "Prior to Hadoop 0.23.2, the NameNode was a single point of failure (SPOF) in an HDFS cluster." - this actually won't appear in 0.23.2, and instead will first show up in 2.0.0.

        Show
        Aaron T. Myers added a comment - It's not introduced by your change, but you might as well fix it here: "Prior to Hadoop 0.23.2, the NameNode was a single point of failure (SPOF) in an HDFS cluster." - this actually won't appear in 0.23.2, and instead will first show up in 2.0.0.
        Hide
        Todd Lipcon added a comment -

        Attached new patch addresses the above feedback. I also attached a delta for your reviewing pleasure.

        Show
        Todd Lipcon added a comment - Attached new patch addresses the above feedback. I also attached a delta for your reviewing pleasure.
        Hide
        Eli Collins added a comment -

        +1 looks great

        Show
        Eli Collins added a comment - +1 looks great
        Hide
        Todd Lipcon added a comment -

        Committed to branch, thanks for reviews

        Show
        Todd Lipcon added a comment - Committed to branch, thanks for reviews

          People

          • Assignee:
            Todd Lipcon
            Reporter:
            Todd Lipcon
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development