You need to fix the new find bugs warning.
The warning is harmless. May be we will supress it.
I can't see the need to have different debug scripts for mappers and reducers.
We need two scripts, since mapper and reducer codes are entirely different. Many times, we may need to debug only one of them. For example, streaming will have two different scripts for mapper and reducer. And users would like to debug them seperately.
I think all of the output (stdout and stderr) from the debug script should be put together when it is stored on the task tracker.
This can be done by concatenating the files if we want. But redirection in the command is not possible, since we dont know the order.
I don't think adding the concept of executable to the file cache is appropriate. It is basically compensating for the lack of permissions in hdfs, which will be addressed more directly. In the mean time, I think that all files coming out of the cache should have the "x" permission set. Note that pipes and streaming already do this...
ok. This can be done. Then, should we do symlink for all files?
Why were the config files for the pipes examples changed to add the "#" part of the url?
For running gdb script by default, we need the c++ executable program to be present in current working directory. So, we need to have symlink of the executable.
Rather than let the user specify a command line that has a bunch (of undocumented) @varaibles, I think it would be better to always use the same parameters: basically, something like: $script @stdout@ @stderr@ @jobconf@ and let the script find the core file if it cares.
Now, we are using @stdout@, @stderr@, @syslog@ and @core@ for the command.
Since pipes have gdb default script which needs core file, we can have that code. And that's a convience given to the user. If you insist, we can remove that.
By default the entire output of the script should be added to diagnostic and 5 is much much too small.
ok. This will be done.