It is probably my fault a bit as well, I'm still working on it I've added a build script in order to prepare the dependencies.
These are actually very good questions! Is it OK if I include these in the readme?
> How did you ensure that the extra parameters are passed to the mappers and the Sqoop1 cmd line tool?
sqoop import -Dyarn.app.mapreduce.am.env="JAVA_HOME=/usr/lib/jvm/jdk1.8.0_60/jre" -Dmapreduce.map.java.opts="-XX:+PreserveFramePointer -XX:InlineSmallCode=200" -Dmapreduce.map.env="JAVA_HOME=/usr/lib/jvm/jdk1.8.0_60/jre" --connect [server] --username [username] --target-dir [hdfs dir] --table [table] -P -m [number of mappers]
> In case of -c which server/service IP is needed there and in what kind of format?
In the case of YARN, you will have to specify the REST address in order the fetch the nodes http://[namenode]:8088/ws/v1/cluster/nodes
> Is it enough to use only with -h?
Yes, but the sampling duration will be only 99Hz with a sampling duration of 5 seconds. But these default values can be changed by specifying the corresponding parameters, or editing hprofiler.sh.
> If I don't wanna use SSH keys is it a valid solution if I type my root password all the time when it executes SSH?
I think this might work, not sure though, because the processes are initiated in parallel, so I don't know how stdin will handle this. However, you can create a file with your password, then cat the file and pipe the password to stdin of the process. You can do this be editing src/host_executor.sh.
> Should it work just out of box ( e.g. defining -h -j -f -t ) or is any postprocess needed?
No post-processing is required. However, interpreting the results might be troublesome.
> Did you use a hadoop distribution like CDH or just pure hdfs and sqoop1?
Yes, we use Sqoop 1.4.6 on CDH 5.5.1.
I hope this helps, if you still have any issues free feel to contact me! I'm definitely willing to improve this tool