Currently, "slaves" are loaded from ~/.slaves. What would be better would be to default from something like conf/hadoop-slaves
Perhaps split slaves, having a different set for "datanodes" vs. "tasktracker" nodes. ie, conf/hadoop-slaves-tasktracker, conf/hadoop-slaves-datanodes, or some similar split. There's the possibility it's worth building in the assumption that tasktracker is a superset, and thus implicitly includes datanodes, but this might be a bad assumption.
Also, make sure all scripts source something like conf/hadoop-env.sh. Thus, the user can edit hadoop-env.sh to specify JAVA_HOME, or an alternate HADOOP_SLAVES location. It would also be desirable to have a seed CLASSPATH here. Possibly name it HADOOP_CLASSPATH, to make it explicit and not make hadoop scripts possibly interact with an otherwise-set system CLASSPATH variable.
These changes would probably be useful to the nutch project, too.