Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.2.0
    • Component/s: grunt
    • Labels:
      None

      Description

      This is a request for a "run file" command in grunt which will read a script from the local file system and execute the script interactively while in the grunt shell.

      One of the things that slows down iterative development of large, complicated Pig scripts that must operate on hadoop fs data is that the edit, run, debug cycle is slow because I must wait to allocate a Hadoop-on-Demand (hod) cluster for each iteration. I would prefer not to preallocate a cluster of nodes (though I could).

      Instead, I'd like to have one window open and edit my Pig script using vim or emacs, write it, and then type "run myscript.pig" at the grunt shell until I get things right.

      I'm used to doing similar things with Oracle, MySQL, and R.

      1. run_command.patch
        10 kB
        Gunther Hagleitner
      2. run_command_params.patch
        14 kB
        Gunther Hagleitner
      3. run_command_params_021109.patch
        14 kB
        Gunther Hagleitner
      4. PIG-574.patch
        14 kB
        Olga Natkovich

        Activity

        Hide
        Gunther Hagleitner added a comment -

        Introduces run and exec command

        Show
        Gunther Hagleitner added a comment - Introduces run and exec command
        Hide
        Olga Natkovich added a comment -

        I will be reviewing this patch

        Show
        Olga Natkovich added a comment - I will be reviewing this patch
        Hide
        Olga Natkovich added a comment -

        I reviewed the patch and it looks good. I also ran unit tests and they all passed.

        In addition, I ran some manual tests and have a couple of comments:

        (1) As implemented now, scripts that ran from within grunt would not be able to take advantage of the parameter substitution as it is not available in the interactive mode. I think this is ok for now and we can revisit it later if users ask for it.
        (2) When using run, I could integrate commands in my script with the interactive commands in the shell which was really nice; however, the commands from the script did not show in the command history. If it is reasonably easy to integrate, it would be nice to have that functionality.
        (3) Very minor thing: after I execute ran command, I see double prompt from grant:

        grunt> grunt>

        Show
        Olga Natkovich added a comment - I reviewed the patch and it looks good. I also ran unit tests and they all passed. In addition, I ran some manual tests and have a couple of comments: (1) As implemented now, scripts that ran from within grunt would not be able to take advantage of the parameter substitution as it is not available in the interactive mode. I think this is ok for now and we can revisit it later if users ask for it. (2) When using run, I could integrate commands in my script with the interactive commands in the shell which was really nice; however, the commands from the script did not show in the command history. If it is reasonably easy to integrate, it would be nice to have that functionality. (3) Very minor thing: after I execute ran command, I see double prompt from grant: grunt> grunt>
        Hide
        Gunther Hagleitner added a comment -

        Thanks for reviewing the patch!

        I tried to address the 3 issues you pointed out:

        1) You can now specify parameters and param files in both the exec and run command

        grunt> run myscript.pig using param_file myparams.ppf
        or:
        grunt> run myscript.pig using param LIMIT=5 param_file myparams.ppf

        The syntax mimics what you can do on the command line when executing a script without the "-"s.

        2) The script lines are now added to the command history in interactive mode

        3) The double grunt... That's actually harder to fix than it thought, but I added a newline, so it won't say:

        grunt> grunt>

        but:

        grunt>
        grunt>

        Let's just tell everyone that that's because they have extra newlines in their scripts. Maybe they won't find out.

        Show
        Gunther Hagleitner added a comment - Thanks for reviewing the patch! I tried to address the 3 issues you pointed out: 1) You can now specify parameters and param files in both the exec and run command grunt> run myscript.pig using param_file myparams.ppf or: grunt> run myscript.pig using param LIMIT=5 param_file myparams.ppf The syntax mimics what you can do on the command line when executing a script without the "-"s. 2) The script lines are now added to the command history in interactive mode 3) The double grunt... That's actually harder to fix than it thought, but I added a newline, so it won't say: grunt> grunt> but: grunt> grunt> Let's just tell everyone that that's because they have extra newlines in their scripts. Maybe they won't find out.
        Hide
        Gunther Hagleitner added a comment -

        Oh, I also ran the unit tests. They pass.

        Show
        Gunther Hagleitner added a comment - Oh, I also ran the unit tests. They pass.
        Hide
        David Ciemiewicz added a comment -

        Thanks!

        This will make so iterative development faster and less painful than preallocating a HOD subcluster and then forgetting to delete it.

        Show
        David Ciemiewicz added a comment - Thanks! This will make so iterative development faster and less painful than preallocating a HOD subcluster and then forgetting to delete it.
        Hide
        Olga Natkovich added a comment -

        Gunther, thanks for quick turnaround on the patch.

        Is there a reason why we can't use exactly the same syntax on run/exec command for parameter substituion as we do on Pig command line. I think that might be easier for users to remember and provide a more consistent interface.

        Show
        Olga Natkovich added a comment - Gunther, thanks for quick turnaround on the patch. Is there a reason why we can't use exactly the same syntax on run/exec command for parameter substituion as we do on Pig command line. I think that might be easier for users to remember and provide a more consistent interface.
        Hide
        Gunther Hagleitner added a comment -

        Good point. I felt it was a little strange to specify "-param" on the grunt shell, but it is easier to remember if your using it outside the shell already.

        So, this patch does the same as the last one, but the syntax is:

        run myscript.pig -param LIMIT=5 -param FILE=/foo/bar.txt -param_file myparams.ppf

        Show
        Gunther Hagleitner added a comment - Good point. I felt it was a little strange to specify "-param" on the grunt shell, but it is easier to remember if your using it outside the shell already. So, this patch does the same as the last one, but the syntax is: run myscript.pig -param LIMIT=5 -param FILE=/foo/bar.txt -param_file myparams.ppf
        Hide
        Olga Natkovich added a comment -

        I tested patch and all is good. I have made one small change - moving parameters in front of the pig script to be consistent with pig cmd syntax. New patch is attached

        Show
        Olga Natkovich added a comment - I tested patch and all is good. I have made one small change - moving parameters in front of the pig script to be consistent with pig cmd syntax. New patch is attached
        Hide
        Olga Natkovich added a comment -

        Patch committed. Thanks Gunther for contributing!

        Show
        Olga Natkovich added a comment - Patch committed. Thanks Gunther for contributing!

          People

          • Assignee:
            Olga Natkovich
            Reporter:
            David Ciemiewicz
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development