So for handling multiple runtimes I'm not sure there is a way to solve this but documenting as a JIRA regardless.
If you are running in a multi-cluster environment where you might want to read data from one cluster and then write the output on another cluster (e.g. generating HFiles to be loaded into a separate HBase cluster), the performance of moving files is noticeable. Specifically due to the fact that the moving of the files happens in the launcher/driver process versus as part of the node execution it seems.
An efficient option would be to kick off a DistCp instead but that would tie the target directly to a runtime which is not a great approach.