Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Won't Fix
-
1.2.1
-
None
-
None
-
GNU bash, version 4.1.2(1)-release (x86_64-redhat-linux-gnu)
Linux foo 2.6.32-431.3.1.el6.x86_64 #1 SMP Fri Dec 13 06:58:20 EST 2013 x86_64 x86_64 x86_64 GNU/Linux
AFS file system.
Description
start-dfs.sh fails to start remote data nodes and task nodes, though it is possible to start them manually through hadoop-daemon.sh.
I've been able to debug and find the root cause the bug, and I thought it was a trivial fix, but I do not know how to do it. Can't figure out a way to handle this seemingly trivial bug.
hadoop-daemons.sh calls slave.sh:
exec "$bin/slaves.sh" --config $HADOOP_CONF_DIR cd "$HADOOP_HOME" \; "$bin/hadoop-daemon.sh" --config $HADOOP_CONF_DIR "$@"
This is the issue when I debug using bash -x: In slaves.sh, the \; becomes ';'
+ ssh xxxx.xx.xxxx.xxx cd /afs/xx.xxxx.xxx/x/x/x/xx/xxxxx/libexec/.. ';' /afs/xx.xxxx.xxx/x/x/x/xx/xxxx/bin/hadoop-daemon.sh --config /afs/xx.xxxx.xxx/x/x/x/xx/xxxx/libexec/../conf start datanode
The problem is ';' . Because the semi-colon is surrounded by quotes, it doesn't execute the code after that. I manually ran the above command, and as expected the data node did not start. When I removed the quotes around the semi-colon, everything works. Please note that you can see the issue only when you do bash -x. If you echo the statement, the quotes around the semi-colon are not visible.
This issue is always reproducible for me, and because of it, I have to manually start daemons on each machine.