Pig
  1. Pig
  2. PIG-2873

Converting bin/pig shell script to python

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 0.10.0
    • Fix Version/s: 0.12.0
    • Component/s: tools
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Tags:
      Python, Pig

      Description

      Converted the shell script in a platform independent way in python. Should work with version 2.7.x

      1. PIG-2873.patch
        12 kB
        Vikram Dixit K
      2. PIG-2873_2.patch
        12 kB
        Vikram Dixit K
      3. PIG-2873_3.patch
        13 kB
        Vikram Dixit K
      4. PIG-2873_4.patch
        17 kB
        Vikram Dixit K

        Activity

        Hide
        Daniel Dai added a comment -

        Patch committed to trunk. Though pig.py need more tests, that could be a new Jira. Thanks Vikram!

        Show
        Daniel Dai added a comment - Patch committed to trunk. Though pig.py need more tests, that could be a new Jira. Thanks Vikram!
        Hide
        Vikram Dixit K added a comment -

        Integrated the python script with the e2e tests. While running the test-e2e target we can use the python script to run the tests by using the flag

        -Dharness.use.python=true
        
        e.g. ant -Dharness.old.pig=/grid/0/pig/old_pig/ -Dharness.cluster.conf=/usr/lib/hadoop/conf/ -Dharness.cluster.bin=/usr/lib/hadoop/bin/hadoop -Dharness.use.python=true test-e2e
        
        
        Show
        Vikram Dixit K added a comment - Integrated the python script with the e2e tests. While running the test-e2e target we can use the python script to run the tests by using the flag -Dharness.use.python=true e.g. ant -Dharness.old.pig=/grid/0/pig/old_pig/ -Dharness.cluster.conf=/usr/lib/hadoop/conf/ -Dharness.cluster.bin=/usr/lib/hadoop/bin/hadoop -Dharness.use.python=true test-e2e
        Hide
        Alan Gates added a comment -

        Vikram,

        Patch looks reasonable. But we need tests to assure that pig.py responds in the same way as the current pig bash shell. These could easily be written as a new driver in the e2e framework.

        Show
        Alan Gates added a comment - Vikram, Patch looks reasonable. But we need tests to assure that pig.py responds in the same way as the current pig bash shell. These could easily be written as a new driver in the e2e framework.
        Hide
        Vikram Dixit K added a comment -

        Addressed Alan's comments.

        Show
        Vikram Dixit K added a comment - Addressed Alan's comments.
        Hide
        Alan Gates added a comment -

        Comments:

        1)

        try:
          confFileHdl = open(os.path.join(os.environ['PIG_CONF_DIR'], 'pig.conf'), 'r')
          for line in confFileHdl:
            words = line.split()
            if len(words) > 2: # since we expect only key value pairs
              continue
            else:
              os.eviron[words[0]] = words[1]
        

        Won't the test "len(words) > 2" mean we reject lines with comments? E.g. "key=value # this is a comment"

        2) In the HCat section we should look for hive jar in /usr/lib/hive when HIVE_HOME isn't set, since that's where Bigtop RPMs put them? Same for HCAT_HOME.

        Show
        Alan Gates added a comment - Comments: 1) try : confFileHdl = open(os.path.join(os.environ['PIG_CONF_DIR'], 'pig.conf'), 'r') for line in confFileHdl: words = line.split() if len(words) > 2: # since we expect only key value pairs continue else : os.eviron[words[0]] = words[1] Won't the test "len(words) > 2" mean we reject lines with comments? E.g. "key=value # this is a comment" 2) In the HCat section we should look for hive jar in /usr/lib/hive when HIVE_HOME isn't set, since that's where Bigtop RPMs put them? Same for HCAT_HOME.
        Hide
        Vikram Dixit K added a comment -

        Works with Python 2.4 and 2.7 as well. To run successfully, please set the HADOOP_HOME and JAVA_HOME values in the environment or provide them as key value pairs in a pig.conf file in PIG_CONF_HOME.

        Show
        Vikram Dixit K added a comment - Works with Python 2.4 and 2.7 as well. To run successfully, please set the HADOOP_HOME and JAVA_HOME values in the environment or provide them as key value pairs in a pig.conf file in PIG_CONF_HOME.
        Hide
        Vikram Dixit K added a comment -

        Tested this on 2.4.3 as well. It needs HADOOP_HOME and JAVA_HOME to be set or present as a set of key-value pairs in the PIG_CONF_DIR/pig.conf. Please let me know if you run into issues.

        Show
        Vikram Dixit K added a comment - Tested this on 2.4.3 as well. It needs HADOOP_HOME and JAVA_HOME to be set or present as a set of key-value pairs in the PIG_CONF_DIR/pig.conf. Please let me know if you run into issues.
        Hide
        Travis Crawford added a comment -

        Very cool the pig script will be switched over to python!

        Just an an FYI though – even though python 2.7 was released quite some time back its likely not installed on a lot of production machines. For example, RHEL5 is probably the most widely deployed version and that ships with python 2.4 as a default. Of course people could install newer versions but it may not be available by default.

        Show
        Travis Crawford added a comment - Very cool the pig script will be switched over to python! Just an an FYI though – even though python 2.7 was released quite some time back its likely not installed on a lot of production machines. For example, RHEL5 is probably the most widely deployed version and that ships with python 2.4 as a default. Of course people could install newer versions but it may not be available by default.
        Hide
        Vikram Dixit K added a comment -

        To run this script, download and install python 2.7.x. Run the script as python bin/pig.py. Ensure that you have HADOOP_HOME set to the correct location although the script tries to intelligently deduce that.

        Show
        Vikram Dixit K added a comment - To run this script, download and install python 2.7.x. Run the script as python bin/pig.py. Ensure that you have HADOOP_HOME set to the correct location although the script tries to intelligently deduce that.
        Hide
        Vikram Dixit K added a comment -

        This has been tested with python 2.7.1

        Show
        Vikram Dixit K added a comment - This has been tested with python 2.7.1

          People

          • Assignee:
            Vikram Dixit K
            Reporter:
            Vikram Dixit K
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Due:
              Created:
              Updated:
              Resolved:

              Development