Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-2417

Streaming UDFs - allow users to easily write UDFs in scripting languages with no JVM implementation.

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.12.0
    • Fix Version/s: 0.12.0
    • Component/s: None
    • Labels:
      None
    • Patch Info:
      Patch Available
    • Hadoop Flags:
      Reviewed

      Description

      The goal of Streaming UDFs is to allow users to easily write UDFs in scripting languages with no JVM implementation or a limited JVM implementation. The initial proposal is outlined here: https://cwiki.apache.org/confluence/display/PIG/StreamingUDFs.

      In order to implement this we need new syntax to distinguish a streaming UDF from an embedded JVM UDF. I'd propose something like the following (although I'm not sure 'language' is the best term to be using):

      define my_streaming_udfs language('python') ship('my_streaming_udfs.py')

      We'll also need a language-specific controller script that gets shipped to the cluster which is responsible for reading the input stream, deserializing the input data, passing it to the user written script, serializing that script output, and writing that to the output stream.

      Finally, we'll need to add a StreamingUDF class that extends evalFunc. This class will likely share some of the existing code in POStream and ExecutableManager (where it make sense to pull out shared code) to stream data to/from the controller script.

      One alternative approach to creating the StreamingUDF EvalFunc is to use the POStream operator directly. This would involve inserting the POStream operator instead of the POUserFunc operator whenever we encountered a streaming UDF while building the physical plan. This approach seemed problematic because there would need to be a lot of changes in order to support POStream in all of the places we want to be able use UDFs (For example - to operate on a single field inside of a for each statement).

        Attachments

        1. PIG-2417-4.patch
          109 kB
          Jeremy Karn
        2. PIG-2417-5.patch
          154 kB
          Jeremy Karn
        3. PIG-2417-6.patch
          152 kB
          Jeremy Karn
        4. PIG-2417-7.patch
          172 kB
          Jeremy Karn
        5. PIG-2417-8.patch
          153 kB
          Jeremy Karn
        6. PIG-2417-9.patch
          169 kB
          Jeremy Karn
        7. PIG-2417-9-1.patch
          2 kB
          Daniel Dai
        8. PIG-2417-9-2.patch
          32 kB
          Jeremy Karn
        9. PIG-2417-e2e.patch
          15 kB
          Jeremy Karn
        10. PIG-2417-unicode.patch
          1 kB
          Jeremy Karn
        11. streaming.patch
          44 kB
          Jeremy Karn
        12. streaming2.patch
          115 kB
          Jeremy Karn
        13. streaming3.patch
          99 kB
          Jeremy Karn

          Issue Links

            Activity

              People

              • Assignee:
                jeremykarn Jeremy Karn
                Reporter:
                jeremykarn Jeremy Karn
              • Votes:
                5 Vote for this issue
                Watchers:
                9 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: