Hadoop Common
  1. Hadoop Common
  2. HADOOP-442

slaves file should include an 'exclude' section, to prevent "bad" datanodes and tasktrackers from disrupting a cluster

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.12.0
    • Component/s: conf
    • Labels:
      None

      Description

      I recently had a few nodes go bad, such that they were inaccessible to ssh, but were still running their java processes.
      tasks that executed on them were failing, causing jobs to fail.
      I couldn't stop the java processes, because of the ssh issue, so I was helpless until I could actually power down these nodes.
      restarting the cluster doesn't help, even when removing the bad nodes from the slaves file - they just reconnect and are accepted.
      while we plan to avoid tasks from launching on the same nodes over and over, what I'd like is to be able to prevent rogue processes from connecting to the masters.
      Ideally, the slaves file will contain an 'exclude' section, which will list nodes that shouldn't be accessed, and should be ignored if they try to connect. That would also help in configuring the slaves file for a large cluster - I'd list the full range of machines in the cluster, then list the ones that are down in the 'exclude' section

      1. hadoop-442-11.patch
        43 kB
        Wendy Chien
      2. hadoop-442-10.patch
        45 kB
        Wendy Chien
      3. hadoop-442-8.patch
        36 kB
        Wendy Chien

        Issue Links

          Activity

          Hide
          Bryan Pendleton added a comment -

          I too have had problems like this over time.

          Another way to deal with this might be to hand out a new random ID to each instance on startup. The namenode/jobtracker would know the "correct" value, and treat as rogue any nodes which try to register but don't know the current "correct" value. This could be re-set at each full-cluster restart, to make sure that all nodes participating are "up to date" and correct.

          Show
          Bryan Pendleton added a comment - I too have had problems like this over time. Another way to deal with this might be to hand out a new random ID to each instance on startup. The namenode/jobtracker would know the "correct" value, and treat as rogue any nodes which try to register but don't know the current "correct" value. This could be re-set at each full-cluster restart, to make sure that all nodes participating are "up to date" and correct.
          Hide
          Doug Cutting added a comment -

          The slaves file is currently only used by the start/stop scripts, so it won't help here.

          Perhaps the jobtracker and namenode should have a public API that permits particular hosts to be banned. Then the web ui could then use this to let adminstrators ban hosts. We could initialize the list from a config file, in the case of persistently bad hosts.

          Show
          Doug Cutting added a comment - The slaves file is currently only used by the start/stop scripts, so it won't help here. Perhaps the jobtracker and namenode should have a public API that permits particular hosts to be banned. Then the web ui could then use this to let adminstrators ban hosts. We could initialize the list from a config file, in the case of persistently bad hosts.
          Hide
          Marco Nicosia added a comment -

          Would it be better to choose either one or the other to be authoritative for all operations?

          1] The namenode/jobtrackers maintain the slaves file. Membership and other administrative functions are made via API calls to the process, which modifies a file on disk. That file is used, but never modified, by slaves.sh, etc. If the file is still text, it can be modified between process restarts.

          2] The namenode/jobtrackers observe and respect the contents of a file on disk. Standard tools can modify it, but the processes would have to poll the file to see if it has been changed.

          I personally prefer #1, tho I'd hope that any API is open (XML-RPC, REST, SOAP...) instead of RMI so that any set of sysadmin automation can talk to it.

          Show
          Marco Nicosia added a comment - Would it be better to choose either one or the other to be authoritative for all operations? 1] The namenode/jobtrackers maintain the slaves file. Membership and other administrative functions are made via API calls to the process, which modifies a file on disk. That file is used, but never modified, by slaves.sh, etc. If the file is still text, it can be modified between process restarts. 2] The namenode/jobtrackers observe and respect the contents of a file on disk. Standard tools can modify it, but the processes would have to poll the file to see if it has been changed. I personally prefer #1, tho I'd hope that any API is open (XML-RPC, REST, SOAP...) instead of RMI so that any set of sysadmin automation can talk to it.
          Hide
          eric baldeschwieler added a comment -

          Current proposal:

          • Add config variables that points to file containing list of nodes HDFS should expect (slaves file) (optional config)
          • Add config variable that points to a file containing a list of excluded nodes (from previous list) (optional config)
          • The nameNode reads these files on startup (iff config). It keeps a list of included nodes and another of excluded nodes. If the include list is configured, it will be tested when a node registers or heartbeats. If the node is not on the list, it will be told to shutdown on the response. If the exclude list is configured, than a node will also be shutdown if listed.
          • We will add an admin command to re-read the inclusion and exclusion files
          • The job tracker will also read these lists and have a new admin command to reread the files
          Show
          eric baldeschwieler added a comment - Current proposal: Add config variables that points to file containing list of nodes HDFS should expect (slaves file) (optional config) Add config variable that points to a file containing a list of excluded nodes (from previous list) (optional config) The nameNode reads these files on startup (iff config). It keeps a list of included nodes and another of excluded nodes. If the include list is configured, it will be tested when a node registers or heartbeats. If the node is not on the list, it will be told to shutdown on the response. If the exclude list is configured, than a node will also be shutdown if listed. We will add an admin command to re-read the inclusion and exclusion files The job tracker will also read these lists and have a new admin command to reread the files
          Hide
          Wendy Chien added a comment -

          Here is the latest design (pretty much the same as the previous proposal with a few more details)

          1. Adding two new config variables. By default they will not be configured (commented out of hadoop-default).
          a. hadoop.nodes.include – file which contains nodes to include. If this variable is configured, then only nodes on this list will be allowed to register with the namenode
          b. hadoop.nodes.exclude – file which contains nodes to exclude. If this variable is configured, then any node appearing on this list will be denied communication (register, send heartbeats, send block reports, report received blocks, and send error reports) by the namenode, even if it appears in the include file.

          If neither is configured, then any node is allowed to connect. We currently have a slaves file that is used for slaves.sh. This file can be used as the file specified by hadoop.nodes.include, but there is no restriction that it needs to be.

          2. Adding a dfsadmin command (refreshNodes) to reread the inclusion/exclusion files.

          3. The files will be read when the NameNode starts up and whenever it gets a refreshNodes command.

          4. JobTracker will use the same config variables to determine TaskTrackers to include/exclude.

          Show
          Wendy Chien added a comment - Here is the latest design (pretty much the same as the previous proposal with a few more details) 1. Adding two new config variables. By default they will not be configured (commented out of hadoop-default). a. hadoop.nodes.include – file which contains nodes to include. If this variable is configured, then only nodes on this list will be allowed to register with the namenode b. hadoop.nodes.exclude – file which contains nodes to exclude. If this variable is configured, then any node appearing on this list will be denied communication (register, send heartbeats, send block reports, report received blocks, and send error reports) by the namenode, even if it appears in the include file. If neither is configured, then any node is allowed to connect. We currently have a slaves file that is used for slaves.sh. This file can be used as the file specified by hadoop.nodes.include, but there is no restriction that it needs to be. 2. Adding a dfsadmin command (refreshNodes) to reread the inclusion/exclusion files. 3. The files will be read when the NameNode starts up and whenever it gets a refreshNodes command. 4. JobTracker will use the same config variables to determine TaskTrackers to include/exclude.
          Hide
          Yoram Arnon added a comment -

          Yes, right on!

          ssian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12467944 ]

          Show
          Yoram Arnon added a comment - Yes, right on! ssian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12467944 ]
          Hide
          Wendy Chien added a comment -

          Some changes:
          1. There are actually going to be 4 new config variables. 2 for dfs and 2 for mapred. The current names are:
          dfs.client.include
          dfs.client.exclude
          mapred.client.include
          mapred.client.exclude

          -These variables are going to be commented out in hadoop-default, so by default they will not be configured.

          2. I created a generic gatekeeper class in hadoop.util which is used by dfs and mapred.

          3. Currently, the include list will only be looked at for registration. It will not be used for heartbeats, block reports, etc.. The exclude list will be used for everything. The reason for this is if a node can't register, then it can't send heartbeats, so we don't need to check it again.

          4. The JobTracker will not have an admin command to refresh the nodes because such a mechanism does not yet exist.

          Comments welcome.

          Show
          Wendy Chien added a comment - Some changes: 1. There are actually going to be 4 new config variables. 2 for dfs and 2 for mapred. The current names are: dfs.client.include dfs.client.exclude mapred.client.include mapred.client.exclude -These variables are going to be commented out in hadoop-default, so by default they will not be configured. 2. I created a generic gatekeeper class in hadoop.util which is used by dfs and mapred. 3. Currently, the include list will only be looked at for registration. It will not be used for heartbeats, block reports, etc.. The exclude list will be used for everything. The reason for this is if a node can't register, then it can't send heartbeats, so we don't need to check it again. 4. The JobTracker will not have an admin command to refresh the nodes because such a mechanism does not yet exist. Comments welcome.
          Hide
          dhruba borthakur added a comment -

          Nice and simple design. I have one comment. I would like to avoid every heartbeat checking the exclude list. We already know that heartbeat-processing should be very lightweight for the cluster to scale. Can we make the heartbeat check the exclude list only if it is the first heartbeat received after the last invocation of "dfsadmin refreshNodes" command?

          Show
          dhruba borthakur added a comment - Nice and simple design. I have one comment. I would like to avoid every heartbeat checking the exclude list. We already know that heartbeat-processing should be very lightweight for the cluster to scale. Can we make the heartbeat check the exclude list only if it is the first heartbeat received after the last invocation of "dfsadmin refreshNodes" command?
          Hide
          Sameer Paranjpye added a comment -

          A couple of points/questions:

          • The [dfs|mapred].client prefixes are a little confusing. They could be construed as lists of machines from which HDFS clients are permitted to connect, which is not the intent here. I'd also get rid of the 'include' suffix since that list is used to constrain the set of nodes that can participate. I propose calling them:
            [dfs|mapred].hosts and [dfs|mapred].hosts.exclude
          • What is the format of the include and exclude files, it needs to be specified here.
          • The variables should be set to reasonable default values in hadoop-default.xml, not excluded. Setting them to empty strings by default is probably fine.
          • What happens when a node is taken off the include list and not placed on the exclude list? Do heartbeats and block reports from it continue to get processed? Perhaps the 'refreshNodes' command should examine the collection of DatanodeDescriptors and update it to only include those that are on the new include list.
          • What are the semantics of exclude? How does it differ from decommission? Can we unify the two and have exclude mean decommission? What happens if all replicas of some blocks are excluded?
          Show
          Sameer Paranjpye added a comment - A couple of points/questions: The [dfs|mapred] .client prefixes are a little confusing. They could be construed as lists of machines from which HDFS clients are permitted to connect, which is not the intent here. I'd also get rid of the 'include' suffix since that list is used to constrain the set of nodes that can participate. I propose calling them: [dfs|mapred] .hosts and [dfs|mapred] .hosts.exclude What is the format of the include and exclude files, it needs to be specified here. The variables should be set to reasonable default values in hadoop-default.xml, not excluded. Setting them to empty strings by default is probably fine. What happens when a node is taken off the include list and not placed on the exclude list? Do heartbeats and block reports from it continue to get processed? Perhaps the 'refreshNodes' command should examine the collection of DatanodeDescriptors and update it to only include those that are on the new include list. What are the semantics of exclude? How does it differ from decommission? Can we unify the two and have exclude mean decommission? What happens if all replicas of some blocks are excluded?
          Hide
          Yoram Arnon added a comment -

          so perhaps a simpler approach, more in line with the original request:

          • single slaves file
          • no new config variables, since the slaves file is configurable anyway via the hadoop script(s) and env vars
          • lines in the slaves file that start with the word 'exclude' imply that the nodes listed on that line are to be excluded

          pros of this approach:

          • completely backwards compatible
          • very simple for the administrator (and implementor) - no config, no confusion, single slaves file to edit

          the intent was to simplify the administrator's life. I'd list all the cluster in the slaves file, then list a handful of nodes under an 'exclude' section. The exclude statement is an override on whatever is listed elsewhere in the file.
          The behavior is:

          • datanode/tasktracker on nodes that appear in the slaves file and are not excluded are launched on startup
          • any node may connect to the master, unless it is excluded (for backwards compatibility. I'd actually prefer to only allow included nodes to connect but that will wait, perhaps indefinitely)

          nodes are excluded for a reason, basically because they're dead and

          • their presence clutters up the startup/shutdown script output and error logs
          • if they are included in the cluster they may degrade its behavior or performance so they should be actively ignored
          • I wouldn't worry about the replicas of data they contain - they're long dead and re-replicated. If they're alive, I won't exclude them.

          I love the admin command to reread the file.

          Show
          Yoram Arnon added a comment - so perhaps a simpler approach, more in line with the original request: single slaves file no new config variables, since the slaves file is configurable anyway via the hadoop script(s) and env vars lines in the slaves file that start with the word 'exclude' imply that the nodes listed on that line are to be excluded pros of this approach: completely backwards compatible very simple for the administrator (and implementor) - no config, no confusion, single slaves file to edit the intent was to simplify the administrator's life. I'd list all the cluster in the slaves file, then list a handful of nodes under an 'exclude' section. The exclude statement is an override on whatever is listed elsewhere in the file. The behavior is: datanode/tasktracker on nodes that appear in the slaves file and are not excluded are launched on startup any node may connect to the master, unless it is excluded (for backwards compatibility. I'd actually prefer to only allow included nodes to connect but that will wait, perhaps indefinitely) nodes are excluded for a reason, basically because they're dead and their presence clutters up the startup/shutdown script output and error logs if they are included in the cluster they may degrade its behavior or performance so they should be actively ignored I wouldn't worry about the replicas of data they contain - they're long dead and re-replicated. If they're alive, I won't exclude them. I love the admin command to reread the file.
          Hide
          Wendy Chien added a comment -

          -I'll change the names to the ones Sameer proposed and make the default an empty string.

          -The exclude/hosts files expect full hostnames listed, separated by whitespace (similar to the slaves file). I can refine this if it's too cumbersome.

          -Some of us had a discussion about exclude and decommission. The conclusion is that they should be combined.

          Below is the new design:

          1. Namenode behavior in dealing with exclude/hosts lists.
          a. When the namenode starts up, it reads the hosts and exclude files. When a node on the exclude list registers with the namenode, we mark it to be decommissioned. When nodes are done being decommissioned, they are shutdown. Nodes not on the include list will not be allowed to register.
          b. When the namenode gets a refreshNodes command, it will update the hosts and exclude lists. If a node is added to the hosts lists, then it will be allowed to register. If a node is removed from the hosts list, then any further communication will be disallowed and it will be asked to shutdown. If a node is added to the exclude list, then it will start to be decommissioned. If a node is removed from the exlcude list, then the decommission process will be stopped.

          2. Decommissioning a node behaves slightly differently from before.
          a. When a node is being decommissioned, we do not want to use its copies to replicate unless no copies exist on non-decommissioned nodes.
          b. A new thread will be used to periodically check if a node is done being decommissioned. If it is, the next time the node heartbeats, it will be told to shut down.

          Comments welcome (and please let me know if I forgot anything).

          Show
          Wendy Chien added a comment - -I'll change the names to the ones Sameer proposed and make the default an empty string. -The exclude/hosts files expect full hostnames listed, separated by whitespace (similar to the slaves file). I can refine this if it's too cumbersome. -Some of us had a discussion about exclude and decommission. The conclusion is that they should be combined. Below is the new design: 1. Namenode behavior in dealing with exclude/hosts lists. a. When the namenode starts up, it reads the hosts and exclude files. When a node on the exclude list registers with the namenode, we mark it to be decommissioned. When nodes are done being decommissioned, they are shutdown. Nodes not on the include list will not be allowed to register. b. When the namenode gets a refreshNodes command, it will update the hosts and exclude lists. If a node is added to the hosts lists, then it will be allowed to register. If a node is removed from the hosts list, then any further communication will be disallowed and it will be asked to shutdown. If a node is added to the exclude list, then it will start to be decommissioned. If a node is removed from the exlcude list, then the decommission process will be stopped. 2. Decommissioning a node behaves slightly differently from before. a. When a node is being decommissioned, we do not want to use its copies to replicate unless no copies exist on non-decommissioned nodes. b. A new thread will be used to periodically check if a node is done being decommissioned. If it is, the next time the node heartbeats, it will be told to shut down. Comments welcome (and please let me know if I forgot anything).
          Hide
          Wendy Chien added a comment -

          To combine exclude and decommission, I've taken out the dsfadmin command decommission. Starting and stopping decommission is now done through configured files. The last option in the original decommission command was one to see the state of the decommission for a set of nodes. Should this option be combined with refreshNodes or should it still have it's own command (especially if the decommission state is expanded to show progress or if we want to also show other states for the nodes)?

          Show
          Wendy Chien added a comment - To combine exclude and decommission, I've taken out the dsfadmin command decommission. Starting and stopping decommission is now done through configured files. The last option in the original decommission command was one to see the state of the decommission for a set of nodes. Should this option be combined with refreshNodes or should it still have it's own command (especially if the decommission state is expanded to show progress or if we want to also show other states for the nodes)?
          Hide
          Wendy Chien added a comment -

          I still need to make changes to TestDecommission, but I wanted to put up a patch for reviews.

          Show
          Wendy Chien added a comment - I still need to make changes to TestDecommission, but I wanted to put up a patch for reviews.
          Hide
          dhruba borthakur added a comment -

          1. It would be nice if the description of dfs.hosts and dfs.hosts.exclude
          says "Full path name of file ..."

          2. The FSNamesystem.close() function should have a dnthread.join() call.

          3. Can we make FSNamesystem.refreshNodes() package private? i.e. remove the
          "public" keyword from its definition.

          4. The method FSNamesystem.refreshNodes migth need to be synchronized because
          it traverses the datanodeMap. However, the first line in this method (that
          invokes "hostReader.refresh" should preferably be outside this synchrnization.
          It is good to read in contents from the hosts file outside the global
          FSNamesystem lock.

          5. The methods inExcludedHostsList() and inHostsList() could be unified if we
          pass in the specified list as a parameter to this unified method.

          Show
          dhruba borthakur added a comment - 1. It would be nice if the description of dfs.hosts and dfs.hosts.exclude says "Full path name of file ..." 2. The FSNamesystem.close() function should have a dnthread.join() call. 3. Can we make FSNamesystem.refreshNodes() package private? i.e. remove the "public" keyword from its definition. 4. The method FSNamesystem.refreshNodes migth need to be synchronized because it traverses the datanodeMap. However, the first line in this method (that invokes "hostReader.refresh" should preferably be outside this synchrnization. It is good to read in contents from the hosts file outside the global FSNamesystem lock. 5. The methods inExcludedHostsList() and inHostsList() could be unified if we pass in the specified list as a parameter to this unified method.
          Hide
          Wendy Chien added a comment -

          I've attached a new patch. This one also has the changes for TestDecommission and incorporates Dhruba's comments.

          > 1. It would be nice if the description of dfs.hosts and dfs.hosts.exclude says "Full path name of file ..."
          Done.

          > 2. The FSNamesystem.close() function should have a dnthread.join() call.
          Done.

          > 3. Can we make FSNamesystem.refreshNodes() package private? i.e. remove the "public" keyword from its definition.
          Done.

          > 4. The method FSNamesystem.refreshNodes migth need to be synchronized
          Done.

          > 5. The methods inExcludedHostsList() and inHostsList() could be unified
          They are actually a little different because inHostsList returns true if the list is empty too.

          6. Dhruba and I also discussed removing the STOPPED state that I added from DatanodeInfo since it can be merged with DECOMMISSIONED.

          Show
          Wendy Chien added a comment - I've attached a new patch. This one also has the changes for TestDecommission and incorporates Dhruba's comments. > 1. It would be nice if the description of dfs.hosts and dfs.hosts.exclude says "Full path name of file ..." Done. > 2. The FSNamesystem.close() function should have a dnthread.join() call. Done. > 3. Can we make FSNamesystem.refreshNodes() package private? i.e. remove the "public" keyword from its definition. Done. > 4. The method FSNamesystem.refreshNodes migth need to be synchronized Done. > 5. The methods inExcludedHostsList() and inHostsList() could be unified They are actually a little different because inHostsList returns true if the list is empty too. 6. Dhruba and I also discussed removing the STOPPED state that I added from DatanodeInfo since it can be merged with DECOMMISSIONED.
          Hide
          dhruba borthakur added a comment -

          0. Nice work on removing the STOPPED state. Makes the system less complex.

          1. In TestDecommission.waitNodeState, we wait for half second per iteration.
          If this setting causes many prints of the form "Waiting for node to change.."
          then maybe we can change the wait period to 1 second instead of half second.

          2. In TestDecommission, we have the following logic:
          a. decommissionNode()
          b. waitNodeState(DECOMMISSION_INPROGRESS)
          c. commissionNode()
          d. waitNodeState(NORMAL)
          e. decommissionNode()
          f. waitNodeState(DECOMMISSIONED)
          g. checkFile()

          I guess b. should be waitNodeState(DECOMMISSIONED)
          Also, we can sneak in a call to checkFile() betwen steps b and c.

          3. Maybe ClientProtocol.refreshNodes can return a void instead of
          "boolean".

          4. The original way to shutdown a datanode was to make the namenode send a
          DNA_SHUTDOWN command in response to a heartbeat. You enahnced this logic
          to make the datanode catch a UnregisteredDatanodeException and shut itself
          down. Thus, we will now have two ways to shutdown a datanode. Do you think
          that it is preferable to have only one way to shutdown a datanode?

          5. An earlier patch introduced an async thread called the ReplicationMonitor.
          The ReplicationMonitor thread invokes checkDecommissionState(). This
          probably means that the DecommissionedMonitor thread is not needed anymore.

          6. The FSNamesystem.verifyNodeRegistration needs to be synchronized since it
          traverses hosts lists and datanode lists. Since
          FSNamesystem.startDecommission and FSNamesystem.stopDecommission are private
          and are always called from already-synchronized methods, we can remove
          the "synchronized" keyword from their definitions.

          7. I am kind-of reluctant to put in added complexity to pendingTransfers
          to handle the case that a decommissioned node should not be the
          source of a replication request. Especially because the above rule is
          not enforced if there is only one replica in the cluster. Is there any
          way that we can avoid putting in this special purpose case? What bad
          thing can occur if we select the decommissioned node as the source of
          a replication request?

          8. Ideally, the FSNamesystem.verifyNodeShutdown needs to be synchronized
          because it looks up the datanode descriptor. But this method is called
          for every RPC request and a "synchronized" call just adds overhead.
          Maybe we can make FSNamesystem.gotHeartBeat, FSNamesystem.processReport
          and FSNamesystem.blockReceived call FSNamesystem.verifyNodeShutdown.

          Show
          dhruba borthakur added a comment - 0. Nice work on removing the STOPPED state. Makes the system less complex. 1. In TestDecommission.waitNodeState, we wait for half second per iteration. If this setting causes many prints of the form "Waiting for node to change.." then maybe we can change the wait period to 1 second instead of half second. 2. In TestDecommission, we have the following logic: a. decommissionNode() b. waitNodeState(DECOMMISSION_INPROGRESS) c. commissionNode() d. waitNodeState(NORMAL) e. decommissionNode() f. waitNodeState(DECOMMISSIONED) g. checkFile() I guess b. should be waitNodeState(DECOMMISSIONED) Also, we can sneak in a call to checkFile() betwen steps b and c. 3. Maybe ClientProtocol.refreshNodes can return a void instead of "boolean". 4. The original way to shutdown a datanode was to make the namenode send a DNA_SHUTDOWN command in response to a heartbeat. You enahnced this logic to make the datanode catch a UnregisteredDatanodeException and shut itself down. Thus, we will now have two ways to shutdown a datanode. Do you think that it is preferable to have only one way to shutdown a datanode? 5. An earlier patch introduced an async thread called the ReplicationMonitor. The ReplicationMonitor thread invokes checkDecommissionState(). This probably means that the DecommissionedMonitor thread is not needed anymore. 6. The FSNamesystem.verifyNodeRegistration needs to be synchronized since it traverses hosts lists and datanode lists. Since FSNamesystem.startDecommission and FSNamesystem.stopDecommission are private and are always called from already-synchronized methods, we can remove the "synchronized" keyword from their definitions. 7. I am kind-of reluctant to put in added complexity to pendingTransfers to handle the case that a decommissioned node should not be the source of a replication request. Especially because the above rule is not enforced if there is only one replica in the cluster. Is there any way that we can avoid putting in this special purpose case? What bad thing can occur if we select the decommissioned node as the source of a replication request? 8. Ideally, the FSNamesystem.verifyNodeShutdown needs to be synchronized because it looks up the datanode descriptor. But this method is called for every RPC request and a "synchronized" call just adds overhead. Maybe we can make FSNamesystem.gotHeartBeat, FSNamesystem.processReport and FSNamesystem.blockReceived call FSNamesystem.verifyNodeShutdown.
          Hide
          dhruba borthakur added a comment -

          Regarding comment 5 above, it actually might make sense to have a separate thread to check whether a decommission is completed or not. It can run on its own schedule. The ReplicationMonitor thread periodically works every 3 seconds and this periodicity is "too" frequent to be checking decommissioned nodes.

          Show
          dhruba borthakur added a comment - Regarding comment 5 above, it actually might make sense to have a separate thread to check whether a decommission is completed or not. It can run on its own schedule. The ReplicationMonitor thread periodically works every 3 seconds and this periodicity is "too" frequent to be checking decommissioned nodes.
          Hide
          Wendy Chien added a comment -

          Thanks for looking over the patch, Dhruba! I updated it to incorporate Dhruba's comments.

          1. TestDecommission.waitNodeState waits for 1 second now.
          2. I do mean to check for DECOMMISSION_INRPOGRESS to make sure the decommission began, but I want to stop it before it finishes so I can test commissioning a node works too.
          3. refreshNodes now returns void.
          4. UnregisteredDatanodeException was already there, but I also added DisallowedDatanodeException to that clause. I'm inclined to leave them together since they are similar.
          6. added synchronized to verifyNodeRegistration, and removed it from start/stopDecommission.
          7. removed the new code from pendingTransfers
          8. I moved verifyNodeShutdown to FSNamesystem.

          Show
          Wendy Chien added a comment - Thanks for looking over the patch, Dhruba! I updated it to incorporate Dhruba's comments. 1. TestDecommission.waitNodeState waits for 1 second now. 2. I do mean to check for DECOMMISSION_INRPOGRESS to make sure the decommission began, but I want to stop it before it finishes so I can test commissioning a node works too. 3. refreshNodes now returns void. 4. UnregisteredDatanodeException was already there, but I also added DisallowedDatanodeException to that clause. I'm inclined to leave them together since they are similar. 6. added synchronized to verifyNodeRegistration, and removed it from start/stopDecommission. 7. removed the new code from pendingTransfers 8. I moved verifyNodeShutdown to FSNamesystem.
          Hide
          Doug Cutting added a comment -

          I just committed this. Thanks, Wendy!

          Show
          Doug Cutting added a comment - I just committed this. Thanks, Wendy!

            People

            • Assignee:
              Wendy Chien
              Reporter:
              Yoram Arnon
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development