Pig
  1. Pig
  2. PIG-2529

Creation of a Python PiggyBank

    Details

    • Type: New Feature New Feature
    • Status: Open
    • Priority: Minor Minor
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: piggybank
    • Labels:

      Description

      As per a conversation on the Pig user list, I think it makes sense to create a PiggyBank for Python functions. To get us started, here's something short and quick I wrote to convert a bag of single item tuples to one single tuple:

      @outputSchema("t:tuple()")
      def bagToTuple(bag):
      t = tuple([item[0] for item in bag])
      return t

        Activity

        Eli Finkelshteyn created issue -
        Hide
        Prashant Kommireddi added a comment -

        Could this be extended to a generic UDF instead of only single item tuples? By more generic I mean it would be useful to convert any number of items within the inner tuples to be extracted into a single tuple. Optionally, it could take an argument (integer) that specifies the number of items from inner tuples that need to be extracted to a single tuple.

        Eg,

        {(1),(2),(3)}

        => (1,2,3)

        {(1,4),(2,5),(3,6)} => (1,4,2,5,3,6)

        If you pass an argument to the UDF, lets say we want only the first elements from inner tuples the output should be{(1,4),(2,5),(3,6)}

        => (1,2,3)

        This way we provide some flexibility to users of the UDF.

        Show
        Prashant Kommireddi added a comment - Could this be extended to a generic UDF instead of only single item tuples? By more generic I mean it would be useful to convert any number of items within the inner tuples to be extracted into a single tuple. Optionally, it could take an argument (integer) that specifies the number of items from inner tuples that need to be extracted to a single tuple. Eg, {(1),(2),(3)} => (1,2,3) {(1,4),(2,5),(3,6)} => (1,4,2,5,3,6) If you pass an argument to the UDF, lets say we want only the first elements from inner tuples the output should be{(1,4),(2,5),(3,6)} => (1,2,3) This way we provide some flexibility to users of the UDF.
        Hide
        Eli Finkelshteyn added a comment -

        That's true. The original function worked for my original case, but something more general and useful would convert a bag of tuples to just a tuple of the items that were inside the tuples in the original bag (i.e. essentially flattening the original bag by one dimension). This converts:

        {(1,4),(2,5),(3,6)}

        => (1,4,2,5,3,6)

        The function that accomplishes this is:

        @outputSchema("t:tuple()")
        def bagToTuple(bag):
        t = sum(bag,tuple([]))
        return t

        Show
        Eli Finkelshteyn added a comment - That's true. The original function worked for my original case, but something more general and useful would convert a bag of tuples to just a tuple of the items that were inside the tuples in the original bag (i.e. essentially flattening the original bag by one dimension). This converts: {(1,4),(2,5),(3,6)} => (1,4,2,5,3,6) The function that accomplishes this is: @outputSchema("t:tuple()") def bagToTuple(bag): t = sum(bag,tuple([])) return t
        Hide
        Russell Jurney added a comment -

        This is a good opportunity to move piggybank to github. Python UDFs can go here: https://github.com/wilbur/Piggybank/tree/master/src/main/python

        Fork and clone the project, submit a pull request with a test, and your python udf is in piggybank.

        Show
        Russell Jurney added a comment - This is a good opportunity to move piggybank to github. Python UDFs can go here: https://github.com/wilbur/Piggybank/tree/master/src/main/python Fork and clone the project, submit a pull request with a test, and your python udf is in piggybank.
        Hide
        Eli Finkelshteyn added a comment -

        Cool, happy to do it. I don't see a place for tests there though... Should I make a src/test/Python dir and throw it in there, or did you have something else in mind? This is my first contribution, so I'm a little fuzzy on what's expected behavior.

        Show
        Eli Finkelshteyn added a comment - Cool, happy to do it. I don't see a place for tests there though... Should I make a src/test/Python dir and throw it in there, or did you have something else in mind? This is my first contribution, so I'm a little fuzzy on what's expected behavior.
        Hide
        Eli Finkelshteyn added a comment -

        Hey folks,
        I do really want to start adding Python UDFs to PiggyBank, but I'm still hazy on the tests question I asked above. Does anyone have an answer, so I could start sharing my udfs and hopefully help out?

        Show
        Eli Finkelshteyn added a comment - Hey folks, I do really want to start adding Python UDFs to PiggyBank, but I'm still hazy on the tests question I asked above. Does anyone have an answer, so I could start sharing my udfs and hopefully help out?
        Hide
        Dmitriy V. Ryaboy added a comment -

        Hi Eli,
        Yes src/test/python in Wilbur would work. If you can work in python test automation in the build file, that would be really awesome. I am no good at python so unfortunately I can't advise on how to best go about setting that up.

        D

        Show
        Dmitriy V. Ryaboy added a comment - Hi Eli, Yes src/test/python in Wilbur would work. If you can work in python test automation in the build file, that would be really awesome. I am no good at python so unfortunately I can't advise on how to best go about setting that up. D

          People

          • Assignee:
            Unassigned
            Reporter:
            Eli Finkelshteyn
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:

              Development