Pig
  1. Pig
  2. PIG-1387

Syntactical Sugar for PIG-1385

    Details

    • Type: Wish Wish
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.6.0
    • Fix Version/s: 0.10.0
    • Component/s: grunt
    • Labels:

      Description

      From this conversation, extend PIG-1385 to instead of calling UDF use built-in behavior when the (),{},[] groupings are encountered.

      > > What about making them part of the language using symbols?
      > >
      > > instead of
      > >
      > > foreach T generate Tuple($0, $1, $2), Bag($3, $4, $5), $6, $7;
      > >
      > > have language support
      > >
      > > foreach T generate ($0, $1, $2), {$3, $4, $5}, $6, $7;
      > >
      > > or even:
      > >
      > > foreach T generate ($0, $1, $2), {$3, $4, $5}, $6#$7, $8#$9, $10, $11;
      > >
      > >
      > > Is there reason not to do the second or third other than being more
      > > complicated?
      > >
      > > Certainly I'd volunteer to put the top implementation in to the util
      > > package and submit them for builtin's, but the latter syntactic candies
      > > seems more natural..
      > >
      > >
      > >
      > > On Tue, Apr 20, 2010 at 5:24 PM, Alan Gates <gates@yahoo-inc.com> wrote:
      > >
      > >> The grouping package in piggybank is left over from back when Pig
      > allowed
      > >> users to define grouping functions (0.1). Functions like these should
      > go in
      > >> evaluation.util.
      > >>
      > >> However, I'd consider putting these in builtin (in main Pig) instead.
      > >> These are things everyone asks for and they seem like a reasonable
      > addition
      > >> to the core engine. This will be more of a burden to write (as we'll
      > hold
      > >> them to a higher standard) but of more use to people as well.
      > >>
      > >> Alan.
      > >>
      > >>
      > >> On Apr 19, 2010, at 12:53 PM, hc busy wrote:
      > >>
      > >> Some times I wonder... I mean, somebody went to the trouble of making a
      > >>> path
      > >>> called
      > >>>
      > >>> org.apache.pig.piggybank.grouping
      > >>>
      > >>> (where it seems like this code belong), but didn't check in any java
      > code
      > >>> into that package.
      > >>>
      > >>>
      > >>> Any comment about where to put this kind of utility classes?
      > >>>
      > >>>
      > >>>
      > >>> On Mon, Apr 19, 2010 at 12:07 PM, Andrey S <octo47@gmail.com> wrote:
      > >>>
      > >>> 2010/4/19 hc busy <hc.busy@gmail.com>
      > >>>>
      > >>>> That's just the way it is right now, you can't make bags or tuples
      > >>>>> directly... Maybe we should have some UDF's in piggybank for these:
      > >>>>>
      > >>>>> toBag()
      > >>>>> toTuple(); --which is kinda like exec(Tuple in)

      {return in;}

      > >>>>> TupleToBag(); --some times you need it this way for some reason.
      > >>>>>
      > >>>>>
      > >>>>> Ok. I place my current code here, may be later I make a patch (if
      > such
      > >>>> implementation is acceptable of course).
      > >>>>
      > >>>> import org.apache.pig.EvalFunc;
      > >>>> import org.apache.pig.data.BagFactory;
      > >>>> import org.apache.pig.data.DataBag;
      > >>>> import org.apache.pig.data.Tuple;
      > >>>> import org.apache.pig.data.TupleFactory;
      > >>>>
      > >>>> import java.io.IOException;
      > >>>>
      > >>>> /**
      > >>>> * Convert any sequence of fields to bag with specified count of
      > >>>> fields<br>
      > >>>> * Schema: count:int, fld1 [, fld2, fld3, fld4... ].
      > >>>> * Output: count=2, then

      { (fld1, fld2) , (fld3, fld4) ... }

      > >>>> *
      > >>>> * @author astepachev
      > >>>> */
      > >>>> public class ToBag extends EvalFunc<DataBag> {
      > >>>> public BagFactory bagFactory;
      > >>>> public TupleFactory tupleFactory;
      > >>>>
      > >>>> public ToBag()

      { > >>>> bagFactory = BagFactory.getInstance(); > >>>> tupleFactory = TupleFactory.getInstance(); > >>>> }

      > >>>>
      > >>>> @Override
      > >>>> public DataBag exec(Tuple input) throws IOException {
      > >>>> if (input.isNull())
      > >>>> return null;
      > >>>> final DataBag bag = bagFactory.newDefaultBag();
      > >>>> final Integer couter = (Integer) input.get(0);
      > >>>> if (couter == null)
      > >>>> return null;
      > >>>> Tuple tuple = tupleFactory.newTuple();
      > >>>> for (int i = 0; i < input.size() - 1; i++) {
      > >>>> if (i % couter == 0)

      { > >>>> tuple = tupleFactory.newTuple(); > >>>> bag.add(tuple); > >>>> }

      > >>>> tuple.append(input.get(i + 1));
      > >>>> }
      > >>>> return bag;
      > >>>> }
      > >>>> }
      > >>>>
      > >>>> import org.apache.pig.ExecType;
      > >>>> import org.apache.pig.PigServer;
      > >>>> import org.junit.Before;
      > >>>> import org.junit.Test;
      > >>>>
      > >>>> import java.io.IOException;
      > >>>> import java.net.URISyntaxException;
      > >>>> import java.net.URL;
      > >>>>
      > >>>> import static org.junit.Assert.assertTrue;
      > >>>>
      > >>>> /**
      > >>>> * @author astepachev
      > >>>> */
      > >>>> public class ToBagTest {
      > >>>> PigServer pigServer;
      > >>>> URL inputTxt;
      > >>>>
      > >>>> @Before
      > >>>> public void init() throws IOException, URISyntaxException

      { > >>>> pigServer = new PigServer(ExecType.LOCAL); > >>>> inputTxt = > >>>> this.getClass().getResource("bagTest.txt").toURI().toURL(); > >>>> }

      > >>>>
      > >>>> @Test
      > >>>> public void testSimple() throws IOException

      { > >>>> pigServer.registerQuery("a = load '" + inputTxt.toExternalForm() > + > >>>> "' using PigStorage(',') " + > >>>> "as (id:int, a:chararray, b:chararray, c:chararray, > >>>> d:chararray);"); > >>>> pigServer.registerQuery("last = foreach a generate flatten(" + > >>>> ToBag.class.getName() + "(2, id, a, id, b, id, c));"); > >>>> > >>>> pigServer.deleteFile("target/pigtest/func1.txt"); > >>>> pigServer.store("last", "target/pigtest/func1.txt"); > >>>> assertTrue(pigServer.fileSize("target/pigtest/func1.txt") > 0); > >>>> }

      > >>>> }
      > >>>>
      > >>>>
      > >>
      > >
      >

      This is a candidate project for Google summer of code 2011. More information about the program can be found at http://wiki.apache.org/pig/GSoc2011

      1. PIG-1387.4.patch
        7 kB
        Gianmarco De Francisci Morales
      2. PIG-1387.3.patch
        4 kB
        Gianmarco De Francisci Morales
      3. PIG-1387.2.patch
        3 kB
        Gianmarco De Francisci Morales
      4. PIG-1387.1.patch
        3 kB
        Gianmarco De Francisci Morales

        Issue Links

          Activity

          hc busy created issue -
          hc busy made changes -
          Field Original Value New Value
          Link This issue relates to PIG-1385 [ PIG-1385 ]
          Olga Natkovich made changes -
          Fix Version/s 0.9.0 [ 12315191 ]
          Alan Gates made changes -
          Assignee Xuefu Zhang [ xuefuz ]
          Alan Gates made changes -
          Link This issue depends upon PIG-1618 [ PIG-1618 ]
          Olga Natkovich made changes -
          Fix Version/s 0.10 [ 12316246 ]
          Fix Version/s 0.9.0 [ 12315191 ]
          Daniel Dai made changes -
          Labels gsoc2011
          Description From this conversation, extend PIG-1385 to instead of calling UDF use built-in behavior when the (),{},[] groupings are encountered.


          > > What about making them part of the language using symbols?
          > >
          > > instead of
          > >
          > > foreach T generate Tuple($0, $1, $2), Bag($3, $4, $5), $6, $7;
          > >
          > > have language support
          > >
          > > foreach T generate ($0, $1, $2), {$3, $4, $5}, $6, $7;
          > >
          > > or even:
          > >
          > > foreach T generate ($0, $1, $2), {$3, $4, $5}, [$6#$7, $8#$9], $10, $11;
          > >
          > >
          > > Is there reason not to do the second or third other than being more
          > > complicated?
          > >
          > > Certainly I'd volunteer to put the top implementation in to the util
          > > package and submit them for builtin's, but the latter syntactic candies
          > > seems more natural..
          > >
          > >
          > >
          > > On Tue, Apr 20, 2010 at 5:24 PM, Alan Gates <gates@yahoo-inc.com> wrote:
          > >
          > >> The grouping package in piggybank is left over from back when Pig
          > allowed
          > >> users to define grouping functions (0.1). Functions like these should
          > go in
          > >> evaluation.util.
          > >>
          > >> However, I'd consider putting these in builtin (in main Pig) instead.
          > >> These are things everyone asks for and they seem like a reasonable
          > addition
          > >> to the core engine. This will be more of a burden to write (as we'll
          > hold
          > >> them to a higher standard) but of more use to people as well.
          > >>
          > >> Alan.
          > >>
          > >>
          > >> On Apr 19, 2010, at 12:53 PM, hc busy wrote:
          > >>
          > >> Some times I wonder... I mean, somebody went to the trouble of making a
          > >>> path
          > >>> called
          > >>>
          > >>> org.apache.pig.piggybank.grouping
          > >>>
          > >>> (where it seems like this code belong), but didn't check in any java
          > code
          > >>> into that package.
          > >>>
          > >>>
          > >>> Any comment about where to put this kind of utility classes?
          > >>>
          > >>>
          > >>>
          > >>> On Mon, Apr 19, 2010 at 12:07 PM, Andrey S <octo47@gmail.com> wrote:
          > >>>
          > >>> 2010/4/19 hc busy <hc.busy@gmail.com>
          > >>>>
          > >>>> That's just the way it is right now, you can't make bags or tuples
          > >>>>> directly... Maybe we should have some UDF's in piggybank for these:
          > >>>>>
          > >>>>> toBag()
          > >>>>> toTuple(); --which is kinda like exec(Tuple in){return in;}
          > >>>>> TupleToBag(); --some times you need it this way for some reason.
          > >>>>>
          > >>>>>
          > >>>>> Ok. I place my current code here, may be later I make a patch (if
          > such
          > >>>> implementation is acceptable of course).
          > >>>>
          > >>>> import org.apache.pig.EvalFunc;
          > >>>> import org.apache.pig.data.BagFactory;
          > >>>> import org.apache.pig.data.DataBag;
          > >>>> import org.apache.pig.data.Tuple;
          > >>>> import org.apache.pig.data.TupleFactory;
          > >>>>
          > >>>> import java.io.IOException;
          > >>>>
          > >>>> /**
          > >>>> * Convert any sequence of fields to bag with specified count of
          > >>>> fields<br>
          > >>>> * Schema: count:int, fld1 [, fld2, fld3, fld4... ].
          > >>>> * Output: count=2, then { (fld1, fld2) , (fld3, fld4) ... }
          > >>>> *
          > >>>> * @author astepachev
          > >>>> */
          > >>>> public class ToBag extends EvalFunc<DataBag> {
          > >>>> public BagFactory bagFactory;
          > >>>> public TupleFactory tupleFactory;
          > >>>>
          > >>>> public ToBag() {
          > >>>> bagFactory = BagFactory.getInstance();
          > >>>> tupleFactory = TupleFactory.getInstance();
          > >>>> }
          > >>>>
          > >>>> @Override
          > >>>> public DataBag exec(Tuple input) throws IOException {
          > >>>> if (input.isNull())
          > >>>> return null;
          > >>>> final DataBag bag = bagFactory.newDefaultBag();
          > >>>> final Integer couter = (Integer) input.get(0);
          > >>>> if (couter == null)
          > >>>> return null;
          > >>>> Tuple tuple = tupleFactory.newTuple();
          > >>>> for (int i = 0; i < input.size() - 1; i++) {
          > >>>> if (i % couter == 0) {
          > >>>> tuple = tupleFactory.newTuple();
          > >>>> bag.add(tuple);
          > >>>> }
          > >>>> tuple.append(input.get(i + 1));
          > >>>> }
          > >>>> return bag;
          > >>>> }
          > >>>> }
          > >>>>
          > >>>> import org.apache.pig.ExecType;
          > >>>> import org.apache.pig.PigServer;
          > >>>> import org.junit.Before;
          > >>>> import org.junit.Test;
          > >>>>
          > >>>> import java.io.IOException;
          > >>>> import java.net.URISyntaxException;
          > >>>> import java.net.URL;
          > >>>>
          > >>>> import static org.junit.Assert.assertTrue;
          > >>>>
          > >>>> /**
          > >>>> * @author astepachev
          > >>>> */
          > >>>> public class ToBagTest {
          > >>>> PigServer pigServer;
          > >>>> URL inputTxt;
          > >>>>
          > >>>> @Before
          > >>>> public void init() throws IOException, URISyntaxException {
          > >>>> pigServer = new PigServer(ExecType.LOCAL);
          > >>>> inputTxt =
          > >>>> this.getClass().getResource("bagTest.txt").toURI().toURL();
          > >>>> }
          > >>>>
          > >>>> @Test
          > >>>> public void testSimple() throws IOException {
          > >>>> pigServer.registerQuery("a = load '" + inputTxt.toExternalForm()
          > +
          > >>>> "' using PigStorage(',') " +
          > >>>> "as (id:int, a:chararray, b:chararray, c:chararray,
          > >>>> d:chararray);");
          > >>>> pigServer.registerQuery("last = foreach a generate flatten(" +
          > >>>> ToBag.class.getName() + "(2, id, a, id, b, id, c));");
          > >>>>
          > >>>> pigServer.deleteFile("target/pigtest/func1.txt");
          > >>>> pigServer.store("last", "target/pigtest/func1.txt");
          > >>>> assertTrue(pigServer.fileSize("target/pigtest/func1.txt") > 0);
          > >>>> }
          > >>>> }
          > >>>>
          > >>>>
          > >>
          > >
          >
          From this conversation, extend PIG-1385 to instead of calling UDF use built-in behavior when the (),{},[] groupings are encountered.


          > > What about making them part of the language using symbols?
          > >
          > > instead of
          > >
          > > foreach T generate Tuple($0, $1, $2), Bag($3, $4, $5), $6, $7;
          > >
          > > have language support
          > >
          > > foreach T generate ($0, $1, $2), {$3, $4, $5}, $6, $7;
          > >
          > > or even:
          > >
          > > foreach T generate ($0, $1, $2), {$3, $4, $5}, [$6#$7, $8#$9], $10, $11;
          > >
          > >
          > > Is there reason not to do the second or third other than being more
          > > complicated?
          > >
          > > Certainly I'd volunteer to put the top implementation in to the util
          > > package and submit them for builtin's, but the latter syntactic candies
          > > seems more natural..
          > >
          > >
          > >
          > > On Tue, Apr 20, 2010 at 5:24 PM, Alan Gates <gates@yahoo-inc.com> wrote:
          > >
          > >> The grouping package in piggybank is left over from back when Pig
          > allowed
          > >> users to define grouping functions (0.1). Functions like these should
          > go in
          > >> evaluation.util.
          > >>
          > >> However, I'd consider putting these in builtin (in main Pig) instead.
          > >> These are things everyone asks for and they seem like a reasonable
          > addition
          > >> to the core engine. This will be more of a burden to write (as we'll
          > hold
          > >> them to a higher standard) but of more use to people as well.
          > >>
          > >> Alan.
          > >>
          > >>
          > >> On Apr 19, 2010, at 12:53 PM, hc busy wrote:
          > >>
          > >> Some times I wonder... I mean, somebody went to the trouble of making a
          > >>> path
          > >>> called
          > >>>
          > >>> org.apache.pig.piggybank.grouping
          > >>>
          > >>> (where it seems like this code belong), but didn't check in any java
          > code
          > >>> into that package.
          > >>>
          > >>>
          > >>> Any comment about where to put this kind of utility classes?
          > >>>
          > >>>
          > >>>
          > >>> On Mon, Apr 19, 2010 at 12:07 PM, Andrey S <octo47@gmail.com> wrote:
          > >>>
          > >>> 2010/4/19 hc busy <hc.busy@gmail.com>
          > >>>>
          > >>>> That's just the way it is right now, you can't make bags or tuples
          > >>>>> directly... Maybe we should have some UDF's in piggybank for these:
          > >>>>>
          > >>>>> toBag()
          > >>>>> toTuple(); --which is kinda like exec(Tuple in){return in;}
          > >>>>> TupleToBag(); --some times you need it this way for some reason.
          > >>>>>
          > >>>>>
          > >>>>> Ok. I place my current code here, may be later I make a patch (if
          > such
          > >>>> implementation is acceptable of course).
          > >>>>
          > >>>> import org.apache.pig.EvalFunc;
          > >>>> import org.apache.pig.data.BagFactory;
          > >>>> import org.apache.pig.data.DataBag;
          > >>>> import org.apache.pig.data.Tuple;
          > >>>> import org.apache.pig.data.TupleFactory;
          > >>>>
          > >>>> import java.io.IOException;
          > >>>>
          > >>>> /**
          > >>>> * Convert any sequence of fields to bag with specified count of
          > >>>> fields<br>
          > >>>> * Schema: count:int, fld1 [, fld2, fld3, fld4... ].
          > >>>> * Output: count=2, then { (fld1, fld2) , (fld3, fld4) ... }
          > >>>> *
          > >>>> * @author astepachev
          > >>>> */
          > >>>> public class ToBag extends EvalFunc<DataBag> {
          > >>>> public BagFactory bagFactory;
          > >>>> public TupleFactory tupleFactory;
          > >>>>
          > >>>> public ToBag() {
          > >>>> bagFactory = BagFactory.getInstance();
          > >>>> tupleFactory = TupleFactory.getInstance();
          > >>>> }
          > >>>>
          > >>>> @Override
          > >>>> public DataBag exec(Tuple input) throws IOException {
          > >>>> if (input.isNull())
          > >>>> return null;
          > >>>> final DataBag bag = bagFactory.newDefaultBag();
          > >>>> final Integer couter = (Integer) input.get(0);
          > >>>> if (couter == null)
          > >>>> return null;
          > >>>> Tuple tuple = tupleFactory.newTuple();
          > >>>> for (int i = 0; i < input.size() - 1; i++) {
          > >>>> if (i % couter == 0) {
          > >>>> tuple = tupleFactory.newTuple();
          > >>>> bag.add(tuple);
          > >>>> }
          > >>>> tuple.append(input.get(i + 1));
          > >>>> }
          > >>>> return bag;
          > >>>> }
          > >>>> }
          > >>>>
          > >>>> import org.apache.pig.ExecType;
          > >>>> import org.apache.pig.PigServer;
          > >>>> import org.junit.Before;
          > >>>> import org.junit.Test;
          > >>>>
          > >>>> import java.io.IOException;
          > >>>> import java.net.URISyntaxException;
          > >>>> import java.net.URL;
          > >>>>
          > >>>> import static org.junit.Assert.assertTrue;
          > >>>>
          > >>>> /**
          > >>>> * @author astepachev
          > >>>> */
          > >>>> public class ToBagTest {
          > >>>> PigServer pigServer;
          > >>>> URL inputTxt;
          > >>>>
          > >>>> @Before
          > >>>> public void init() throws IOException, URISyntaxException {
          > >>>> pigServer = new PigServer(ExecType.LOCAL);
          > >>>> inputTxt =
          > >>>> this.getClass().getResource("bagTest.txt").toURI().toURL();
          > >>>> }
          > >>>>
          > >>>> @Test
          > >>>> public void testSimple() throws IOException {
          > >>>> pigServer.registerQuery("a = load '" + inputTxt.toExternalForm()
          > +
          > >>>> "' using PigStorage(',') " +
          > >>>> "as (id:int, a:chararray, b:chararray, c:chararray,
          > >>>> d:chararray);");
          > >>>> pigServer.registerQuery("last = foreach a generate flatten(" +
          > >>>> ToBag.class.getName() + "(2, id, a, id, b, id, c));");
          > >>>>
          > >>>> pigServer.deleteFile("target/pigtest/func1.txt");
          > >>>> pigServer.store("last", "target/pigtest/func1.txt");
          > >>>> assertTrue(pigServer.fileSize("target/pigtest/func1.txt") > 0);
          > >>>> }
          > >>>> }
          > >>>>
          > >>>>
          > >>
          > >
          >

          This is a candidate project for Google summer of code 2011. More information about the program can be found at http://wiki.apache.org/pig/GSoc2011
          Daniel Dai made changes -
          Assignee Xuefu Zhang [ xuefuz ]
          Dmitriy V. Ryaboy made changes -
          Link This issue is related to PIG-2038 [ PIG-2038 ]
          Gianmarco De Francisci Morales made changes -
          Attachment PIG-1387.1.patch [ 12489370 ]
          Gianmarco De Francisci Morales made changes -
          Attachment PIG-1387.2.patch [ 12489477 ]
          Daniel Dai made changes -
          Link This issue relates to PIG-2082 [ PIG-2082 ]
          Gianmarco De Francisci Morales made changes -
          Assignee Gianmarco De Francisci Morales [ azaroth ]
          Gianmarco De Francisci Morales made changes -
          Attachment PIG-1387.3.patch [ 12494051 ]
          Gianmarco De Francisci Morales made changes -
          Attachment PIG-1387.4.patch [ 12499523 ]
          Gianmarco De Francisci Morales made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Gianmarco De Francisci Morales made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Daniel Dai made changes -
          Status Resolved [ 5 ] Closed [ 6 ]

            People

            • Assignee:
              Gianmarco De Francisci Morales
              Reporter:
              hc busy
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development