Pig
  1. Pig
  2. PIG-1777

LoadFunc in a scripting language

    Details

    • Type: New Feature New Feature
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    • Patch Info:
      Patch Available

      Description

      Provide a mechanism for loading custom objects from a Sequence file with the conversion from the object to Pig objects happening in a scripting language.

      1. scripted-load-patch-3.patch
        13 kB
        John Meagher
      2. Initial-scripted-load2.patch
        7 kB
        John Meagher
      3. Initial-scripted-load.patch
        7 kB
        John Meagher

        Activity

        Hide
        Jonathan Coveney added a comment -

        Now that we get the digest email of patch availables, it'll be important to cull out old ones. This clearly needs some work before its in a commitable state, so I'm removing that the patch is available.

        Show
        Jonathan Coveney added a comment - Now that we get the digest email of patch availables, it'll be important to cull out old ones. This clearly needs some work before its in a commitable state, so I'm removing that the patch is available.
        Hide
        John Meagher added a comment -

        I'm not actively working on it. I'm not sure what "finding the function from the list of registered functions" means. I have no objections to delaying it and getting it right.

        Show
        John Meagher added a comment - I'm not actively working on it. I'm not sure what "finding the function from the list of registered functions" means. I have no objections to delaying it and getting it right.
        Hide
        Alan Gates added a comment -

        +1 for delaying this. I'd like to get it done but I won't have time before we branch for 0.9.

        Show
        Alan Gates added a comment - +1 for delaying this. I'd like to get it done but I won't have time before we branch for 0.9.
        Hide
        Olga Natkovich added a comment -

        Alan and John - are you still working on this and how close you are to solving this? We are getting to the point of stabilizing 0.9 and at this time would not want to make major changes or add major new functionality.

        Let me know if you are ok delaying this to the next release

        Show
        Olga Natkovich added a comment - Alan and John - are you still working on this and how close you are to solving this? We are getting to the point of stabilizing 0.9 and at this time would not want to make major changes or add major new functionality. Let me know if you are ok delaying this to the next release
        Hide
        Alan Gates added a comment -

        Changes look good.

        One thing I missed before is that rather than finding the function from the list of registered functions as the eval funcs do, it reads it from a script file. We want to present a consistent look and for all UDFs, so we'll want to change this to work the same way as the eval funcs do. I'm happy to help with that change, since it may require knowledge of Pig internals. But it will be later this week or next week before I get to it.

        Show
        Alan Gates added a comment - Changes look good. One thing I missed before is that rather than finding the function from the list of registered functions as the eval funcs do, it reads it from a script file. We want to present a consistent look and for all UDFs, so we'll want to change this to work the same way as the eval funcs do. I'm happy to help with that change, since it may require knowledge of Pig internals. But it will be later this week or next week before I get to it.
        Hide
        John Meagher added a comment -

        Add patch 3 to address Alan's comments.

        • Changed the base class to work with any input type
        • Added sequence file specific implementation
        • Added unit test
        Show
        John Meagher added a comment - Add patch 3 to address Alan's comments. Changed the base class to work with any input type Added sequence file specific implementation Added unit test
        Hide
        Alan Gates added a comment -

        The change to work for a generic InputFormat instead of just sequence files is to have your loader take its InputFormat as a constructor argument instead hard coding it. While in general InputFormats are not required to return Writables I think it would be ok to say that an InputFormat used with this loader should return both Writable key and value.

        The next step for this will be to add a reference implementation and tesst. A unit test that implements a loader in a scripting language would be fine.

        Show
        Alan Gates added a comment - The change to work for a generic InputFormat instead of just sequence files is to have your loader take its InputFormat as a constructor argument instead hard coding it. While in general InputFormats are not required to return Writables I think it would be ok to say that an InputFormat used with this loader should return both Writable key and value. The next step for this will be to add a reference implementation and tesst. A unit test that implements a loader in a scripting language would be fine.
        Hide
        John Meagher added a comment -

        It works with Sequence files because that's what I needed and the getInputFormat had to return something.

        Is there an example of something similar that will work with arbitrary InputFormats and RecordReaders?

        Show
        John Meagher added a comment - It works with Sequence files because that's what I needed and the getInputFormat had to return something. Is there an example of something similar that will work with arbitrary InputFormats and RecordReaders?
        Hide
        Alan Gates added a comment -

        Looks interesting. Extending the scripting UDFs to include load and store is a logical next step.

        One question, why tie this to Sequence files? Why not let the user specify any input format that can return a key and value?

        Show
        Alan Gates added a comment - Looks interesting. Extending the scripting UDFs to include load and store is a logical next step. One question, why tie this to Sequence files? Why not let the user specify any input format that can return a key and value?
        Hide
        John Meagher added a comment -

        Attaching initial 2 to fix something that came up when testing

        Show
        John Meagher added a comment - Attaching initial 2 to fix something that came up when testing
        Hide
        John Meagher added a comment -

        Initial cut of the scripted loader

        Show
        John Meagher added a comment - Initial cut of the scripted loader

          People

          • Assignee:
            John Meagher
            Reporter:
            John Meagher
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:

              Development