Pig
  1. Pig
  2. PIG-3591

Refactor POPackage to separate MR specific code from packaging

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.13.0
    • Component/s: None
    • Labels:
      None

      Description

      POPackage is currently closely associated with the MR shuffle semantics. This makes it difficult to adapt the variety of subclasses of POPackage to other execution engines without duplicating code.

      1. PIG-3591.5.patch
        256 kB
        Mark Wagner
      2. PIG-3591.2.patch
        230 kB
        Mark Wagner
      3. PIG-3591.1.patch
        230 kB
        Mark Wagner

        Issue Links

          Activity

          Hide
          Mark Wagner added a comment -

          Separate "packaging" logic from "shuffle handling" logic. This moves the packaging logic to a new class "Packager", which is extended by CombinePackager, LitePackager, MultiQueryPackager, and JoinPackager.

          This is not finished. Known problems are illustrate and streaming the last input are not implemented.

          Show
          Mark Wagner added a comment - Separate "packaging" logic from "shuffle handling" logic. This moves the packaging logic to a new class "Packager", which is extended by CombinePackager, LitePackager, MultiQueryPackager, and JoinPackager. This is not finished. Known problems are illustrate and streaming the last input are not implemented.
          Hide
          Mark Wagner added a comment -

          Here's the latest patch.

          Show
          Mark Wagner added a comment - Here's the latest patch.
          Hide
          Cheolsoo Park added a comment -

          +1. Thank you so much for the great work! I have one minor comment below.

          When you commit, do you mind fixing the following test case? Shouldn't "distinct A" to be "order A by x"? I see another testDistinct() in the test suite that tests distinct, so it looks like this test case is supposed to test order by. Please correct me if I am wrong.

          TestExampleGenerator.java
          +    @Test
          +    public void testOrderBy() throws Exception {
          +        PigServer pigServer = new PigServer(pigContext);
          +        pigServer.registerQuery("A = load " + A.toString() + " as (x, y);");
          +        pigServer.registerQuery("B = distinct A;");
          +        Map<Operator, DataBag> derivedData = pigServer.getExamples("B");
          +
                   assertNotNull(derivedData);
               }
          
          Show
          Cheolsoo Park added a comment - +1. Thank you so much for the great work! I have one minor comment below. When you commit, do you mind fixing the following test case? Shouldn't "distinct A" to be "order A by x"? I see another testDistinct() in the test suite that tests distinct, so it looks like this test case is supposed to test order by. Please correct me if I am wrong. TestExampleGenerator.java + @Test + public void testOrderBy() throws Exception { + PigServer pigServer = new PigServer(pigContext); + pigServer.registerQuery( "A = load " + A.toString() + " as (x, y);" ); + pigServer.registerQuery( "B = distinct A;" ); + Map<Operator, DataBag> derivedData = pigServer.getExamples( "B" ); + assertNotNull(derivedData); }
          Hide
          Cheolsoo Park added a comment -

          I confirmed all unit tests pass myself, so I went ahead committed the final patch (+ my comment above) to trunk. I will merge it down to tez branch later today.

          Show
          Cheolsoo Park added a comment - I confirmed all unit tests pass myself, so I went ahead committed the final patch (+ my comment above) to trunk. I will merge it down to tez branch later today.

            People

            • Assignee:
              Mark Wagner
              Reporter:
              Mark Wagner
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development