Description
There are two similar minor issues regarding running Pig scripts located in non-local FileSystems such as hdfs and s3.
- The first occurs when the script path is passed using the ‘-f’ option. In this case, the script contents are not set in ScriptState. Instead a WARN message is logged due to an IOException being thrown. This is because the ‘remote’ path is treated as a local one. Instead, the path of the downloaded script should be passed over to ScriptState#setScript. As a result of this bug, an empty string is set for the “pig.script” property when the Pig job runs on a Hadoop cluster. Also, if Tez is being used, then the Dag info does not include the script contents as it normally does when a local script is passed.
- The second issue is more minor, but #validateLogFile in the Main class is set to use the path given by the user rather than using the downloaded local file path. Again, #validateLogFile method treats the given path as a local one, but this would not be the case if the user specifies a remote path. i.e. one with a scheme such as hdfs or s3. This occurs in both cases: when the script is specified using the ‘-f’ option or when the script is passed as the last/remaining argument.
Both fixes to these issues are to just pass in the local downloaded path instead. If the script path specified is a local one, then the local downloaded path would just be that path specified.