Converted the shell script in a platform independent way in python. Should work with version 2.7.x
This has been tested with python 2.7.1
To run this script, download and install python 2.7.x. Run the script as python bin/pig.py. Ensure that you have HADOOP_HOME set to the correct location although the script tries to intelligently deduce that.
Very cool the pig script will be switched over to python!
Just an an FYI though – even though python 2.7 was released quite some time back its likely not installed on a lot of production machines. For example, RHEL5 is probably the most widely deployed version and that ships with python 2.4 as a default. Of course people could install newer versions but it may not be available by default.
Tested this on 2.4.3 as well. It needs HADOOP_HOME and JAVA_HOME to be set or present as a set of key-value pairs in the PIG_CONF_DIR/pig.conf. Please let me know if you run into issues.
Works with Python 2.4 and 2.7 as well. To run successfully, please set the HADOOP_HOME and JAVA_HOME values in the environment or provide them as key value pairs in a pig.conf file in PIG_CONF_HOME.
confFileHdl = open(os.path.join(os.environ['PIG_CONF_DIR'], 'pig.conf'), 'r')
for line in confFileHdl:
words = line.split()
if len(words) > 2: # since we expect only key value pairs
os.eviron[words] = words
Won't the test "len(words) > 2" mean we reject lines with comments? E.g. "key=value # this is a comment"
2) In the HCat section we should look for hive jar in /usr/lib/hive when HIVE_HOME isn't set, since that's where Bigtop RPMs put them? Same for HCAT_HOME.
Addressed Alan's comments.
Patch looks reasonable. But we need tests to assure that pig.py responds in the same way as the current pig bash shell. These could easily be written as a new driver in the e2e framework.
Integrated the python script with the e2e tests. While running the test-e2e target we can use the python script to run the tests by using the flag
e.g. ant -Dharness.old.pig=/grid/0/pig/old_pig/ -Dharness.cluster.conf=/usr/lib/hadoop/conf/ -Dharness.cluster.bin=/usr/lib/hadoop/bin/hadoop -Dharness.use.python=true test-e2e
Patch committed to trunk. Though pig.py need more tests, that could be a new Jira. Thanks Vikram!