Pig
  1. Pig
  2. PIG-506

Does pig need a NATIVE keyword?

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.8.0
    • Component/s: impl
    • Labels:
    • Hadoop Flags:
      Reviewed

      Description

      Assume a user had a job that broke easily into three pieces. Further assume that pieces one and three were easily expressible in pig, but that piece two needed to be written in map reduce for whatever reason (performance, something that pig could not easily express, legacy job that was too important to change, etc.). Today the user would either have to use map reduce for the entire job or manually handle the stitching together of pig and map reduce jobs. What if instead pig provided a NATIVE keyword that would allow the script to pass off the data stream to the underlying system (in this case map reduce). The semantics of NATIVE would vary by underlying system. In the map reduce case, we would assume that this indicated a collection of one or more fully contained map reduce jobs, so that pig would store the data, invoke the map reduce jobs, and then read the resulting data to continue. It might look something like this:

      A = load 'myfile';
      X = load 'myotherfile';
      B = group A by $0;
      C = foreach B generate group, myudf(B);
      D = native (jar=mymr.jar, infile=frompig outfile=topig);
      E = join D by $0, X by $0;
      ...
      

      This differs from streaming in that it allows the user to insert an arbitrary amount of native processing, whereas streaming allows the insertion of one binary. It also differs in that, for streaming, data is piped directly into and out of the binary as part of the pig pipeline. Here the pipeline would be broken, data written to disk, and the native block invoked, then data read back from disk.

      Another alternative is to say this is unnecessary because the user can do the coordination from java, using the PIgServer interface to run pig and calling the map reduce job explicitly. The advantages of the native keyword are that the user need not be worried about coordination between the jobs, pig will take care of it. Also the user can make use of existing java applications without being a java programmer.

      1. NativeImplInitial.patch
        40 kB
        Aniket Mokashi
      2. NativeMapReduceFinale1.patch
        53 kB
        Aniket Mokashi
      3. NativeMapReduceFinale2.patch
        65 kB
        Aniket Mokashi
      4. NativeMapReduceFinale3.patch
        66 kB
        Aniket Mokashi
      5. PIG-506.2.patch
        42 kB
        Thejas M Nair
      6. PIG-506.3.patch
        46 kB
        Thejas M Nair
      7. PIG-506.patch
        68 kB
        Thejas M Nair
      8. TestWordCount.jar
        3 kB
        Aniket Mokashi

        Activity

        Alan Gates created issue -
        Daniel Dai made changes -
        Field Original Value New Value
        Labels mentor gsoc
        Daniel Dai made changes -
        Assignee Alan Gates [ alangates ] Aniket Mokashi [ aniket486 ]
        Olga Natkovich made changes -
        Fix Version/s 0.8.0 [ 12314562 ]
        Aniket Mokashi made changes -
        Attachment NativeImplInitial.patch [ 12451717 ]
        Aniket Mokashi made changes -
        Attachment NativeMapReduceFinale1.patch [ 12452577 ]
        Aniket Mokashi made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Aniket Mokashi made changes -
        Attachment TestWordCount.jar [ 12452584 ]
        Aniket Mokashi made changes -
        Attachment NativeMapReduceFinale2.patch [ 12452596 ]
        Aniket Mokashi made changes -
        Attachment NativeMapReduceFinale3.patch [ 12452680 ]
        Aniket Mokashi made changes -
        Assignee Aniket Mokashi [ aniket486 ] Thejas M Nair [ thejas ]
        Thejas M Nair made changes -
        Assignee Thejas M Nair [ thejas ] Aniket Mokashi [ aniket486 ]
        Thejas M Nair made changes -
        Attachment PIG-506.patch [ 12452847 ]
        Thejas M Nair made changes -
        Attachment PIG-506.2.patch [ 12453140 ]
        Thejas M Nair made changes -
        Attachment PIG-506.3.patch [ 12453192 ]
        Thejas M Nair made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Hadoop Flags [Reviewed]
        Resolution Fixed [ 1 ]
        Olga Natkovich made changes -
        Status Resolved [ 5 ] Closed [ 6 ]

          People

          • Assignee:
            Aniket Mokashi
            Reporter:
            Alan Gates
          • Votes:
            3 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development