Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-2850

Pig should support loading macro files as resources stored in JAR files

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.11
    • Component/s: None
    • Labels:
      None

      Description

      A file containing macros can be imported in pig like so:

      IMPORT 'some_path/my_macros.pig';
      

      It would be convenient if a macro file could be imported from a registered JAR as well. This would make it easier to distribute them. One could package a set of UDFs and macros in a single JAR. Once the JAR is registered any of the UDFs or macros can be used once.

      For example, support that some_path/my_macros.pig has been packaged in a JAR named my_macros.jar. The above code then becomes

      REGISTER my_macros.jar;
      IMPORT 'some_path/my_macros.pig';
      

      Pig would first check if the file is found at the path some_path/my_macros.pig, and failing that it would attempt to load a resource by that name. Since the JAR is registered it will find it.

      1. import_macros_from_jar.diff
        7 kB
        Matthew Hayes
      2. import_macros_from_jar_5.diff
        9 kB
        Matthew Hayes
      3. import_macros_from_jar_4.diff
        8 kB
        Matthew Hayes
      4. import_macros_from_jar_3.diff
        8 kB
        Matthew Hayes
      5. import_macros_from_jar_2.diff
        8 kB
        Matthew Hayes

        Issue Links

          Activity

          Hide
          dvryaboy Dmitriy V. Ryaboy added a comment -

          Committed to trunk.

          Show
          dvryaboy Dmitriy V. Ryaboy added a comment - Committed to trunk.
          Hide
          matterhayes Matthew Hayes added a comment -

          yep i did, sorry, now fixed

          Show
          matterhayes Matthew Hayes added a comment - yep i did, sorry, now fixed
          Hide
          dvryaboy Dmitriy V. Ryaboy added a comment -

          Fails to compile. Did you forget to add ResourceNotFoundException ?

          Show
          dvryaboy Dmitriy V. Ryaboy added a comment - Fails to compile. Did you forget to add ResourceNotFoundException ?
          Hide
          matterhayes Matthew Hayes added a comment -

          Oops I should have done an svn up. I resolved the conflicts and submitted patch #4.

          Show
          matterhayes Matthew Hayes added a comment - Oops I should have done an svn up. I resolved the conflicts and submitted patch #4.
          Hide
          dvryaboy Dmitriy V. Ryaboy added a comment -

          Ok.. it's actually a legit conflict, not just a merge issue. This code was changed in PIG-2866. I'll ask Bill to take a look.

          Show
          dvryaboy Dmitriy V. Ryaboy added a comment - Ok.. it's actually a legit conflict, not just a merge issue. This code was changed in PIG-2866 . I'll ask Bill to take a look.
          Hide
          dvryaboy Dmitriy V. Ryaboy added a comment -

          Patch isn't applying cleanly. I'll remediate and post rebased patch.

          Show
          dvryaboy Dmitriy V. Ryaboy added a comment - Patch isn't applying cleanly. I'll remediate and post rebased patch.
          Hide
          dvryaboy Dmitriy V. Ryaboy added a comment -

          Perfect, thanks for the explanation. I like patch 3, will commit.
          Sorry about the slow turnaround, feel free to ping if there are no comments for more than a few days next time.. we shouldn't let tickets languish like that.

          Show
          dvryaboy Dmitriy V. Ryaboy added a comment - Perfect, thanks for the explanation. I like patch 3, will commit. Sorry about the slow turnaround, feel free to ping if there are no comments for more than a few days next time.. we shouldn't let tickets languish like that.
          Hide
          matterhayes Matthew Hayes added a comment -

          The function could legitimately return null if the macro file is not found as a file or a resource. I thought an error about a resource not being found could be cryptic if the user isn't expecting this behavior when importing the macro file. So I checked for null in getMacroFile after attempting to fetch as a resource and then threw and exception about the file not being found.

          Alternatively I could throw a ResourceNotFoundException in fetchResource and catch this in getMacroFile. See diff #3 where I am doing this instead.

          Thanks!

          Show
          matterhayes Matthew Hayes added a comment - The function could legitimately return null if the macro file is not found as a file or a resource. I thought an error about a resource not being found could be cryptic if the user isn't expecting this behavior when importing the macro file. So I checked for null in getMacroFile after attempting to fetch as a resource and then threw and exception about the file not being found. Alternatively I could throw a ResourceNotFoundException in fetchResource and catch this in getMacroFile. See diff #3 where I am doing this instead. Thanks!
          Hide
          dvryaboy Dmitriy V. Ryaboy added a comment -

          Looks good. In fetchResource, you check if resourceStream is not null; if it is null, the whole function will return null. I suspect that would be surprising and cause NPEs downstream. Perhaps an exception-throwing assert would be better here? Is there a legit reason for this function to ever return null instead of throwing?

          Show
          dvryaboy Dmitriy V. Ryaboy added a comment - Looks good. In fetchResource, you check if resourceStream is not null; if it is null, the whole function will return null. I suspect that would be surprising and cause NPEs downstream. Perhaps an exception-throwing assert would be better here? Is there a legit reason for this function to ever return null instead of throwing?
          Hide
          dvryaboy Dmitriy V. Ryaboy added a comment -

          Sorry, didn't notice you updated. Will review.

          Show
          dvryaboy Dmitriy V. Ryaboy added a comment - Sorry, didn't notice you updated. Will review.
          Hide
          matterhayes Matthew Hayes added a comment -

          Is there anything else necessary for me to change?

          Show
          matterhayes Matthew Hayes added a comment - Is there anything else necessary for me to change?
          Hide
          matterhayes Matthew Hayes added a comment -

          Thanks for the feedback

          Replaced LOG.info with LOG.debug.
          Wrapped BufferedOutputStream around stream.

          For the test, I like creating the JAR in the test so the test is totally self contained. You don't need to refer to a file in a separate JAR to understand it. What do you think?

          Regarding writing to a temp file, I did it this way to mimic what happens with FileLocalizer. Other parts of QueryParserDriver are assuming the macro file is local and I didn't want to make significant changes. In this new diff I've moved the code to FileLocalizer.fetchResource(), which seems cleaner.

          Show
          matterhayes Matthew Hayes added a comment - Thanks for the feedback Replaced LOG.info with LOG.debug. Wrapped BufferedOutputStream around stream. For the test, I like creating the JAR in the test so the test is totally self contained. You don't need to refer to a file in a separate JAR to understand it. What do you think? Regarding writing to a temp file, I did it this way to mimic what happens with FileLocalizer. Other parts of QueryParserDriver are assuming the macro file is local and I didn't want to make significant changes. In this new diff I've moved the code to FileLocalizer.fetchResource(), which seems cleaner.
          Hide
          dvryaboy Dmitriy V. Ryaboy added a comment -

          Nice!
          couple of quick notes:

          most of the LOG.info calls should be LOG.debug

          OutputStream should probably be buffered

          The test involves creating a jar, writing it out to FS, and registering it.. how about we just check in a jar in test/resources ?

          It's a little weird to be writing out to a temp file.. would be cleaner to allow pig to pick up from a jar without intermediate storage. Ideas on how to do that?

          Show
          dvryaboy Dmitriy V. Ryaboy added a comment - Nice! couple of quick notes: most of the LOG.info calls should be LOG.debug OutputStream should probably be buffered The test involves creating a jar, writing it out to FS, and registering it.. how about we just check in a jar in test/resources ? It's a little weird to be writing out to a temp file.. would be cleaner to allow pig to pick up from a jar without intermediate storage. Ideas on how to do that?
          Hide
          matterhayes Matthew Hayes added a comment -

          Sample implementation with unit test

          Show
          matterhayes Matthew Hayes added a comment - Sample implementation with unit test

            People

            • Assignee:
              matterhayes Matthew Hayes
              Reporter:
              matterhayes Matthew Hayes
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development