Pig
  1. Pig
  2. PIG-2165

Need a way to deal with params and param_file in embedded pig in python

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.9.0
    • Fix Version/s: 0.10.0
    • Component/s: impl
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      I am using embedded pig in python and cannot pass param key value pairs to the python script. The only way to pass params seem to be by passing it in the bind command.

      Is there a plan to have command line parameters to a pig embedded python script? Similar needs for param_file and using the environment variables.

      Thanks
      Supreeth

      1. PIG-2165-1.patch
        8 kB
        Daniel Dai
      2. PIG-2165-2.patch
        9 kB
        Daniel Dai

        Issue Links

          Activity

          Hide
          Daniel Dai added a comment -

          Unit test pass. test-patch:
          [exec] -1 overall.
          [exec]
          [exec] +1 @author. The patch does not contain any @author tags.
          [exec]
          [exec] +1 tests included. The patch appears to include 3 new or modified tests.
          [exec]
          [exec] +1 javadoc. The javadoc tool did not generate any warning messages.
          [exec]
          [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
          [exec]
          [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
          [exec]
          [exec] -1 release audit. The applied patch generated 468 release audit warnings (more than the trunk's current 461 warnings).

          No new files added, ignore release audit warning.

          Show
          Daniel Dai added a comment - Unit test pass. test-patch: [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] -1 release audit. The applied patch generated 468 release audit warnings (more than the trunk's current 461 warnings). No new files added, ignore release audit warning.
          Hide
          Daniel Dai added a comment -

          Patch committed to both trunk and 0.10 branch.

          Show
          Daniel Dai added a comment - Patch committed to both trunk and 0.10 branch.
          Hide
          Daniel Dai added a comment -

          Add new test case as Ashutosh suggest.

          Show
          Daniel Dai added a comment - Add new test case as Ashutosh suggest.
          Hide
          Ashutosh Chauhan added a comment -

          +1 Looks good to me. If its easy enough, it will be good to add e2e test with param file as well, since patch touches upon that part of code.

          Show
          Ashutosh Chauhan added a comment - +1 Looks good to me. If its easy enough, it will be good to add e2e test with param file as well, since patch touches upon that part of code.
          Hide
          Daniel Dai added a comment -

          Hi, Julien,
          PIG-2165-1.patch does use global variable PigContext.params to pass params. What it does is:
          1. Put params into PigContext.params when processing command line
          2. Adding PigContext.params to do variable binding:

                  // plist is the variable array to bind
                  if (getScriptContext().getPigContext().getParams()!=null) {
                      for (String param : getScriptContext().getPigContext().getParams()) {
                          plist.add(param);
                      }
                  }
                  // Use plist to do the binding
          

          3. For param_file, instead of reading and put into params, I delegate it to ParameterSubstitutionPreprocessor, which already handles parameter_file:

              psp.genSubstitutedFile(in, writer, plist.toArray(params), scriptContext.getPigContext().getParamFiles().toArray(type1));
          

          4. I also did some refactory in main() to use PigContext.params

          Show
          Daniel Dai added a comment - Hi, Julien, PIG-2165 -1.patch does use global variable PigContext.params to pass params. What it does is: 1. Put params into PigContext.params when processing command line 2. Adding PigContext.params to do variable binding: // plist is the variable array to bind if (getScriptContext().getPigContext().getParams()!= null ) { for ( String param : getScriptContext().getPigContext().getParams()) { plist.add(param); } } // Use plist to do the binding 3. For param_file, instead of reading and put into params, I delegate it to ParameterSubstitutionPreprocessor, which already handles parameter_file: psp.genSubstitutedFile(in, writer, plist.toArray(params), scriptContext.getPigContext().getParamFiles().toArray(type1)); 4. I also did some refactory in main() to use PigContext.params
          Hide
          Daniel Dai added a comment -

          Thanks Julien. I'm fine with global variable approach. I can submit another patch.

          Show
          Daniel Dai added a comment - Thanks Julien. I'm fine with global variable approach. I can submit another patch.
          Hide
          Julien Le Dem added a comment -

          Hi Daniel,
          This works but I would rather have the parameters set to global variables in the Interpreter.
          Using the pre-processor is less intuitive as parameter values will need different escaping depending of the language or the location of the script where they are used. For example in a string vs not. Assigning to a global variable avoids this kind of issue.
          Julien

          Show
          Julien Le Dem added a comment - Hi Daniel, This works but I would rather have the parameters set to global variables in the Interpreter. Using the pre-processor is less intuitive as parameter values will need different escaping depending of the language or the location of the script where they are used. For example in a string vs not. Assigning to a global variable avoids this kind of issue. Julien
          Hide
          Daniel Dai added a comment -

          Yes, for "$var" in your script, it can be either a bounded variable, or a command line parameter. I treat them the same in patch.

          Show
          Daniel Dai added a comment - Yes, for "$var" in your script, it can be either a bounded variable, or a command line parameter. I treat them the same in patch.
          Hide
          Julien Le Dem added a comment -

          Hi Daniel,
          Are you suggesting to use the preprocessor to replace $variable with its value passed in "-p variable=value" ?

          Show
          Julien Le Dem added a comment - Hi Daniel, Are you suggesting to use the preprocessor to replace $variable with its value passed in "-p variable=value" ?
          Hide
          Daniel Dai added a comment -

          I think we shall stick with the conventional Pig command line options (-p, -m). I don't think we need to invent new ways specific for Pig embedding.

          Show
          Daniel Dai added a comment - I think we shall stick with the conventional Pig command line options (-p, -m). I don't think we need to invent new ways specific for Pig embedding.
          Hide
          Julien Le Dem added a comment -

          I agree, on second thoughts my second option is a bad idea as the script can be launched either through the command line or programmatically through the java API. In that case the only parameters that are available are the ones passed through -p.
          There's no good way to reuse sys.argv here, it is not really a good fit as it is a list of strings.

          Suggestions:

          • a separate dictionary: Pig.getParameters() ?
          • placed in global variables
          Show
          Julien Le Dem added a comment - I agree, on second thoughts my second option is a bad idea as the script can be launched either through the command line or programmatically through the java API. In that case the only parameters that are available are the ones passed through -p. There's no good way to reuse sys.argv here, it is not really a good fit as it is a list of strings. Suggestions: a separate dictionary: Pig.getParameters() ? placed in global variables
          Hide
          Alan Gates added a comment -

          I like the idea of only passing the parameters from either the command line or the parameters file. I don't see why the Python script should care about other parameters that will be Pig specific. Whether those parameters are placed in global variables, in sys.argv, or in a separate dictionary (pig_argv?) I don't have an opinion on.

          Show
          Alan Gates added a comment - I like the idea of only passing the parameters from either the command line or the parameters file. I don't see why the Python script should care about other parameters that will be Pig specific. Whether those parameters are placed in global variables, in sys.argv, or in a separate dictionary (pig_argv?) I don't have an opinion on.
          Hide
          Julien Le Dem added a comment -

          I see two options:

          • One way would be to use the pig parameters and assign the values to global variables.
            pig -p foo=bar myscript.py

            and then in the script, the variable foo contains "bar"

          • The other would be to populate sys.argv with the command line parameters
            sys.argv = [ "pig", "-p", "foo=bar", "myscript.py" ]
          Show
          Julien Le Dem added a comment - I see two options: One way would be to use the pig parameters and assign the values to global variables. pig -p foo=bar myscript.py and then in the script, the variable foo contains "bar" The other would be to populate sys.argv with the command line parameters sys.argv = [ "pig" , "-p" , "foo=bar" , "myscript.py" ]

            People

            • Assignee:
              Daniel Dai
              Reporter:
              Supreeth
            • Votes:
              1 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development