i would very much like the option of submitting jobs from a workstation outside ec2 to a hadoop cluster in ec2. This has been explored here:
the net result of this is that we can make this work (along with using a socks proxy) with a couple of changes in the ec2 scripts:
a) use public 'hostname' for fs.default.name setting (instead of the private hostname being used currently)
b) mark hadoop.rpc.socket.factory.class.default as final variable in the generated hadoop-site.xml (that applies to server side)
#a has no downside as far as i can tell since public hostnames resolve to internal/private IP addresses within ec2 (so traffic is optimally routed).