Pig
  1. Pig
  2. PIG-2651

Provide a much easier to use accumulator interface

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.11
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      This introduces a new interface, IteratingAccumulatorEvalFunc (that name is NOT final...). The cool thing about this patch is that it is built purely on top of the existing Accumulator code (well, it uses PIG-2066, but it could easily work without it). That is to say, it's an easier way to write accumulators without having to fork the Pig codebase.

      The downside is that the only way I am able to provide such a clean interface is by using a second thread. I need to explore any potential performance implications, but given that most of the easy to use Pig stuff has performance implications, I think as long as we measure and and document them, it's worth the much more usable interface. Plus I don't think it will be too bad as one thread does the heavy lifting, while another just ferries values in between. SUM could now be written as:

      public class SUM extends IteratingAccumulatorEvalFunc<Long> {
          public Long exec(Iterator<Tuple> it) throws IOException {
              long sum = 0;
      
              while (it.hasNext()) {
                  sum += (Long)it.next().get(0);
              }
      
              return sum;
          }
      }
      

      Besides performance tests, I need to figure out how to properly test this sort of thing. I particularly welcome advice on that front.

      1. PIG-2651-0.patch
        16 kB
        Jonathan Coveney
      2. PIG-2651-1.patch
        31 kB
        Jonathan Coveney
      3. PIG-2651-2.patch
        14 kB
        Jonathan Coveney

        Issue Links

          Activity

          Hide
          Jonathan Coveney added a comment -

          Thanks Daniel!

          Show
          Jonathan Coveney added a comment - Thanks Daniel!
          Hide
          Daniel Dai added a comment -

          +1 for the patch.

          Committed to trunk. I don't feel it should be backported to 0.10 branch. Drop the 0.10 from fix version.

          Show
          Daniel Dai added a comment - +1 for the patch. Committed to trunk. I don't feel it should be backported to 0.10 branch. Drop the 0.10 from fix version.
          Hide
          Jonathan Coveney added a comment -

          rebased the patch

          Show
          Jonathan Coveney added a comment - rebased the patch
          Hide
          Jonathan Coveney added a comment -

          Woosh, PIG2066 is in, so I'd love a more thorough review of this specific proposal

          Show
          Jonathan Coveney added a comment - Woosh, PIG2066 is in, so I'd love a more thorough review of this specific proposal
          Hide
          Jonathan Coveney added a comment -

          Please find attached a patch with tests. Note: in the process of adding the tests, I ran into this:
          https://issues.apache.org/jira/browse/PIG-2694
          It's not blocking, but something to consider...

          Also: this patch includes the contents of https://issues.apache.org/jira/browse/PIG-2066. The new files are:

          .../apache/pig/IteratingAccumulatorEvalFunc.java
          .../udf/evalfunc/IteratingAccumulatorCount.java
          .../udf/evalfunc/IteratingAccumulatorIsEmpty.java
          .../test/udf/evalfunc/IteratingAccumulatorSum.java

          And of course, new e2e tests in nightly.conf

          Show
          Jonathan Coveney added a comment - Please find attached a patch with tests. Note: in the process of adding the tests, I ran into this: https://issues.apache.org/jira/browse/PIG-2694 It's not blocking, but something to consider... Also: this patch includes the contents of https://issues.apache.org/jira/browse/PIG-2066 . The new files are: .../apache/pig/IteratingAccumulatorEvalFunc.java .../udf/evalfunc/IteratingAccumulatorCount.java .../udf/evalfunc/IteratingAccumulatorIsEmpty.java .../test/udf/evalfunc/IteratingAccumulatorSum.java And of course, new e2e tests in nightly.conf
          Hide
          Jonathan Coveney added a comment -

          Alan, can you take a look at PIG-2066? That has the fundamental TerminatingAccumulator work, and I'd like to keep the testing/code for that there, and have this patch focus on the IteratingAccumulatorEvalFunc interface once that is finished. I have javadocs, but am not sure what stability and audience annotations to add. For TerminatingAccumulator, I think it could be considered public and stable...for IteratingAccumulatorEvalFunc, Public and Evolving?

          Show
          Jonathan Coveney added a comment - Alan, can you take a look at PIG-2066 ? That has the fundamental TerminatingAccumulator work, and I'd like to keep the testing/code for that there, and have this patch focus on the IteratingAccumulatorEvalFunc interface once that is finished. I have javadocs, but am not sure what stability and audience annotations to add. For TerminatingAccumulator, I think it could be considered public and stable...for IteratingAccumulatorEvalFunc, Public and Evolving?
          Hide
          Alan Gates added a comment -

          In general looks good. This will be great for performance in certain situations. A few issues:

          • The new files need Apache License headers.
          • The new files need javadocs.
          • The new files need stability and audience annotations.
          • We need tests. In particular it should test that this works stand alone, that multiple work together, and that it works with other non-TerminatingAccumulator accumulators.
          Show
          Alan Gates added a comment - In general looks good. This will be great for performance in certain situations. A few issues: The new files need Apache License headers. The new files need javadocs. The new files need stability and audience annotations. We need tests. In particular it should test that this works stand alone, that multiple work together, and that it works with other non-TerminatingAccumulator accumulators.

            People

            • Assignee:
              Jonathan Coveney
              Reporter:
              Jonathan Coveney
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development