I think calling this as blacklisting will lead to more confusion. As Owen suggested we can call it as decommissioning/recommissioning of trackers which would essentially mean that irrespective of what state the tracker is, the jobtracker is asked to decommission(rerun+ignore)/recommission(add back) it. So the command would be
bin/hadoop jobtracker -decommission tracker1,tracker2.... and bin/hadoop jobtracker -recommission tracker1,tracker2.....
All the running tasks (also completed maps) that were launched on that machine will be killed and rerun. We can reuse the lost-tracker code for doing this. Maybe a thread should be started on demand (similar to cleanup queue thread) for a decommissioning request. Also these tracker will be added to the ignore list (i.e issue a 'shutdown' upon contact). So a decommission request is equivalent to lost-tracker + add-to-ignore-list.
Upon a recommission, the trackers will be removed from the ignore list. This can be done inline.
From the webui, a simple checkbox against all the trackers can be provided and an action named 'Decommission' can be provided (similar to actions for jobs on jobtracker.jsp). On the trackers page, we can provide another section for decommissioned trackers and there we can provide a checkbox for recommissioning it.
1) Acls check should be done before decommissioning and recommissioning.
2) This info needs to be persisted. Upon every decommission/recommission, persist this info to system.dir/jobtracker.info
3) Upon restart, the ignore list will also be recovered and loaded (i.e invoke jobtracker.decommission(recovered-list) from recovery-manager)
4) These new apis can be added to the TaskTrackerManager interface as there really are tasktracker level actions.