Pi, thanks for your prompt feedback. Couple of follow up questions/comments.
>> 1. You assume no escaping in shell command, right?
Yes, I assume that the command is formed such that it can run as is. At this point it does not allow backticks inside of the command. If this turned out to be needed we can change that. The only escaping that is done is for $identifier to prevent the substitution. This is done by the substitution code itself.
>> 2. The name "UtilFunctions" implies it does not hold state (even global state). From the way it is used, we should have a better name or refactor is needed.
This is how the code was submitted to me and renaming things is not one of things I like to do . Since it is not an interface point, I don't really want to change names.
>> 3. PigFileParser.unquote still doesn't do escaping.
There are two types of escaping supported for the lines: for quotes and for $identifier. Since the preprocessor does not interpret any other data, I don't think anything else is needed.
>> 4. <DEFALT> token in PigFileParser misspelled
Will fix that .
>> 5. Why ParamLoader.Parse() throw IOException?
I am not sure but seems like all parser we have do that.
>> 6. In UtilFunctions.substitute, what does "replaced_line = replaced_line.replaceAll("\\\\\\$","
The indent is to allow escaping $identifier. I am not exactly sure why it requires so many escapes. I can try and see if there is another more intuitive way to do that.
>> 7. Shouldn't logger be declared "private final Logger logger = Logger.getLogger("org.apache.pig.preprocessor.log");" (everything in one line) to make it consistent?
Will fix that
>> 1. I prefer HashMap to HashTable
Any particular reason?
2. Pattern identifier in UtilFunctions.substitute can be made static, this makes it 1 microsecond faster
Will fix that .
I also realized that I we don't have to require that single word parameters have to be quoted. I will make that change in the parser, make the updates that you suggested and submit new patch