Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-11034

Provide a command line tool to clear decommissioned DataNode information from the NameNode without restarting.

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: namenode
    • Labels:
      None

      Description

      Information about decommissioned DataNodes remains tracked in the NameNode for the entire NameNode process lifetime. Currently, the only way to clear this information is to restart the NameNode. This issue proposes to add a way to clear this information online, without requiring a process restart.

        Activity

        Hide
        brahmareddy Brahma Reddy Battula added a comment -

        can we add one argument to hdfs dfsadmin -refreshNodes itself instead of new admin command..?
        May be something like clearDeadNodes..?

        Show
        brahmareddy Brahma Reddy Battula added a comment - can we add one argument to hdfs dfsadmin -refreshNodes itself instead of new admin command..? May be something like clearDeadNodes ..?
        Hide
        cnauroth Chris Nauroth added a comment -

        Hello Gergely Novák.

        If the decommissioned host is removed from the dfs.hosts.exclude file, followed by running hdfs dfsadmin -refreshNodes, then the host is no longer considered to be excluded. If the DataNode process is still running, or if it's restarted accidentally, then that DataNode will re-register with the NameNode, come back into service and become a candidate for writing new blocks.

        I was imagining a new workflow, where the host remains decommissioned, but the administrator has a way to clear out the in-memory tracked state about that node. It's interesting that you brought up the exclude file. Since that's already the existing mechanism for inclusion/exclusion of hosts, I wonder if there is a way to enhance it to cover this use case, so that administrators wouldn't need to learn a new command. I'll think about it more (and comments are welcome from others who have ideas too).

        Show
        cnauroth Chris Nauroth added a comment - Hello Gergely Novák . If the decommissioned host is removed from the dfs.hosts.exclude file, followed by running hdfs dfsadmin -refreshNodes , then the host is no longer considered to be excluded. If the DataNode process is still running, or if it's restarted accidentally, then that DataNode will re-register with the NameNode, come back into service and become a candidate for writing new blocks. I was imagining a new workflow, where the host remains decommissioned, but the administrator has a way to clear out the in-memory tracked state about that node. It's interesting that you brought up the exclude file. Since that's already the existing mechanism for inclusion/exclusion of hosts, I wonder if there is a way to enhance it to cover this use case, so that administrators wouldn't need to learn a new command. I'll think about it more (and comments are welcome from others who have ideas too).
        Hide
        GergelyNovak Gergely Novák added a comment -

        I'm new here. Chris Nauroth could you please explain to me the difference between this expected new dfsadmin command and removing the decommissioned datanode(s) from the dfs.hosts file and executing hdfs dfsadmin -refreshNodes? Thank you.

        Show
        GergelyNovak Gergely Novák added a comment - I'm new here. Chris Nauroth could you please explain to me the difference between this expected new dfsadmin command and removing the decommissioned datanode(s) from the dfs.hosts file and executing hdfs dfsadmin -refreshNodes ? Thank you.
        Hide
        cnauroth Chris Nauroth added a comment -

        We can add a new dfsadmin command to clear this state.

        It's important to note that for some operations workflows, it's valuable to retain the decommissioned node information. If the operator is working on a series of decommission/recommission steps, then this information is valuable to see which nodes are still remaining in decommissioned state. That likely means that the command line needs to accept an argument for a specific host instead of just blindly clearing all decommissioned node information.

        Remember to clear from both NameNodes in an HA pair.

        Show
        cnauroth Chris Nauroth added a comment - We can add a new dfsadmin command to clear this state. It's important to note that for some operations workflows, it's valuable to retain the decommissioned node information. If the operator is working on a series of decommission/recommission steps, then this information is valuable to see which nodes are still remaining in decommissioned state. That likely means that the command line needs to accept an argument for a specific host instead of just blindly clearing all decommissioned node information. Remember to clear from both NameNodes in an HA pair.

          People

          • Assignee:
            Unassigned
            Reporter:
            cnauroth Chris Nauroth
          • Votes:
            0 Vote for this issue
            Watchers:
            13 Start watching this issue

            Dates

            • Created:
              Updated:

              Development