Pig
  1. Pig
  2. PIG-2417

Streaming UDFs - allow users to easily write UDFs in scripting languages with no JVM implementation.

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.12.0
    • Fix Version/s: 0.12.0
    • Component/s: None
    • Labels:
      None
    • Patch Info:
      Patch Available
    • Hadoop Flags:
      Reviewed

      Description

      The goal of Streaming UDFs is to allow users to easily write UDFs in scripting languages with no JVM implementation or a limited JVM implementation. The initial proposal is outlined here: https://cwiki.apache.org/confluence/display/PIG/StreamingUDFs.

      In order to implement this we need new syntax to distinguish a streaming UDF from an embedded JVM UDF. I'd propose something like the following (although I'm not sure 'language' is the best term to be using):

      define my_streaming_udfs language('python') ship('my_streaming_udfs.py')

      We'll also need a language-specific controller script that gets shipped to the cluster which is responsible for reading the input stream, deserializing the input data, passing it to the user written script, serializing that script output, and writing that to the output stream.

      Finally, we'll need to add a StreamingUDF class that extends evalFunc. This class will likely share some of the existing code in POStream and ExecutableManager (where it make sense to pull out shared code) to stream data to/from the controller script.

      One alternative approach to creating the StreamingUDF EvalFunc is to use the POStream operator directly. This would involve inserting the POStream operator instead of the POUserFunc operator whenever we encountered a streaming UDF while building the physical plan. This approach seemed problematic because there would need to be a lot of changes in order to support POStream in all of the places we want to be able use UDFs (For example - to operate on a single field inside of a for each statement).

      1. PIG-2417-unicode.patch
        1 kB
        Jeremy Karn
      2. PIG-2417-9-2.patch
        32 kB
        Jeremy Karn
      3. PIG-2417-9-1.patch
        2 kB
        Daniel Dai
      4. PIG-2417-9.patch
        169 kB
        Jeremy Karn
      5. PIG-2417-e2e.patch
        15 kB
        Jeremy Karn
      6. PIG-2417-8.patch
        153 kB
        Jeremy Karn
      7. PIG-2417-7.patch
        172 kB
        Jeremy Karn
      8. PIG-2417-6.patch
        152 kB
        Jeremy Karn
      9. PIG-2417-5.patch
        154 kB
        Jeremy Karn
      10. PIG-2417-4.patch
        109 kB
        Jeremy Karn
      11. streaming3.patch
        99 kB
        Jeremy Karn
      12. streaming2.patch
        115 kB
        Jeremy Karn
      13. streaming.patch
        44 kB
        Jeremy Karn

        Issue Links

          Activity

          Jeremy Karn created issue -
          Jeremy Karn made changes -
          Field Original Value New Value
          Status Open [ 1 ] Patch Available [ 10002 ]
          Jeremy Karn made changes -
          Attachment python_streaming_string.patch [ 12507125 ]
          Jeremy Karn made changes -
          Attachment python_streaming_string.patch [ 12507125 ]
          Jeremy Karn made changes -
          Attachment python_streaming_string.patch [ 12507126 ]
          Alan Gates made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Jeremy Karn made changes -
          Attachment python_streaming_string.patch [ 12507126 ]
          Jeremy Karn made changes -
          Attachment streaming.patch [ 12507623 ]
          Ashutosh Chauhan made changes -
          Assignee Jeremy Karn [ jeremykarn ]
          Jeremy Karn made changes -
          Attachment streaming2.patch [ 12508535 ]
          Jeremy Karn made changes -
          Attachment streaming3.patch [ 12509992 ]
          Jeremy Karn made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Alan Gates made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Jeremy Karn made changes -
          Attachment PIG-2417-4.patch [ 12571279 ]
          Jeremy Karn made changes -
          Attachment PIG-2417-5.patch [ 12599706 ]
          Jeremy Karn made changes -
          Status Open [ 1 ] In Progress [ 3 ]
          Jeremy Karn made changes -
          Status In Progress [ 3 ] Open [ 1 ]
          Jeremy Karn made changes -
          Patch Info Patch Available [ 10042 ]
          Jeremy Karn made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Affects Version/s 0.12 [ 12323380 ]
          Affects Version/s 0.11 [ 12318878 ]
          Jeremy Karn made changes -
          Assignee Jeremy Karn [ jeremykarn ]
          Jeremy Karn made changes -
          Fix Version/s 0.12 [ 12323380 ]
          Jeremy Karn made changes -
          Attachment PIG-2417-6.patch [ 12601437 ]
          Jeremy Karn made changes -
          Attachment PIG-2417-7.patch [ 12601668 ]
          Jeremy Karn made changes -
          Attachment PIG-2417-8.patch [ 12601981 ]
          Jeremy Karn made changes -
          Attachment PIG-2417-e2e.patch [ 12602202 ]
          Rohini Palaniswamy made changes -
          Link This issue requires PIG-3255 [ PIG-3255 ]
          Jeremy Karn made changes -
          Attachment PIG-2417-9.patch [ 12603921 ]
          Daniel Dai made changes -
          Attachment PIG-2417-9-1.patch [ 12604153 ]
          Daniel Dai made changes -
          Attachment PIG-2417-9-1.patch [ 12604153 ]
          Daniel Dai made changes -
          Attachment PIG-2417-9-1.patch [ 12604156 ]
          Jeremy Karn made changes -
          Attachment PIG-2417-9-2.patch [ 12604176 ]
          Daniel Dai made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Hadoop Flags Reviewed [ 10343 ]
          Assignee Jeremy Karn [ jeremykarn ]
          Resolution Fixed [ 1 ]
          Jeremy Karn made changes -
          Attachment PIG-2417-unicode.patch [ 12605089 ]
          Jeremy Karn made changes -
          Attachment PIG-3478.patch [ 12607596 ]
          Jeremy Karn made changes -
          Attachment PIG-3478.patch [ 12607596 ]
          Daniel Dai made changes -
          Status Resolved [ 5 ] Closed [ 6 ]

            People

            • Assignee:
              Jeremy Karn
              Reporter:
              Jeremy Karn
            • Votes:
              5 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development