Converted the shell script in a platform independent way in python. Should work with version 2.7.x
Patch committed to trunk. Though pig.py need more tests, that could be a new Jira. Thanks Vikram!
Integrated the python script with the e2e tests. While running the test-e2e target we can use the python script to run the tests by using the flag
e.g. ant -Dharness.old.pig=/grid/0/pig/old_pig/ -Dharness.cluster.conf=/usr/lib/hadoop/conf/ -Dharness.cluster.bin=/usr/lib/hadoop/bin/hadoop -Dharness.use.python=true test-e2e
Patch looks reasonable. But we need tests to assure that pig.py responds in the same way as the current pig bash shell. These could easily be written as a new driver in the e2e framework.
Addressed Alan's comments.
confFileHdl = open(os.path.join(os.environ['PIG_CONF_DIR'], 'pig.conf'), 'r')
for line in confFileHdl:
words = line.split()
if len(words) > 2: # since we expect only key value pairs
os.eviron[words] = words
Won't the test "len(words) > 2" mean we reject lines with comments? E.g. "key=value # this is a comment"
2) In the HCat section we should look for hive jar in /usr/lib/hive when HIVE_HOME isn't set, since that's where Bigtop RPMs put them? Same for HCAT_HOME.
Works with Python 2.4 and 2.7 as well. To run successfully, please set the HADOOP_HOME and JAVA_HOME values in the environment or provide them as key value pairs in a pig.conf file in PIG_CONF_HOME.
Tested this on 2.4.3 as well. It needs HADOOP_HOME and JAVA_HOME to be set or present as a set of key-value pairs in the PIG_CONF_DIR/pig.conf. Please let me know if you run into issues.
Very cool the pig script will be switched over to python!
Just an an FYI though – even though python 2.7 was released quite some time back its likely not installed on a lot of production machines. For example, RHEL5 is probably the most widely deployed version and that ships with python 2.4 as a default. Of course people could install newer versions but it may not be available by default.
To run this script, download and install python 2.7.x. Run the script as python bin/pig.py. Ensure that you have HADOOP_HOME set to the correct location although the script tries to intelligently deduce that.
This has been tested with python 2.7.1