Pig
  1. Pig
  2. PIG-2286

Using COR function in Piggybank results in ERROR 2018: Internal error. Unable to introduce the combiner for optimization

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.9.0
    • Fix Version/s: 0.9.1, 0.10.0
    • Component/s: impl, piggybank
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Usage of the COR function in a Pig script, results in an error. The "studenttab5" contains student, age and gpa separated by "tab".

      register /home/viraj/pig-svn/trunk/contrib/piggybank/java/piggybank.jar;
      A = LOAD '/user/viraj/studenttab5' AS (name, age:double,gpa:double);
      B = group A all;
      C = foreach B generate group, COR(A.a, A.b);
      dump C;
      

      2011-09-14 17:03:22,001 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://localhost:9000
      2011-09-14 17:03:22,088 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: localhost:9001
      2011-09-14 17:03:22,960 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: GROUP_BY
      2011-09-14 17:03:23,168 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
      2011-09-14 17:03:23,179 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.CombinerOptimizer - Choosing to move algebraic foreach to combiner
      2011-09-14 17:03:23,186 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2018: Internal error. Unable to introduce the combiner for optimization.

      Viraj

      1. PIG-2286-2.patch
        6 kB
        Daniel Dai
      2. PIG-2286-1.patch
        5 kB
        Daniel Dai

        Activity

        Hide
        Daniel Dai added a comment -

        Patch committed to both trunk and 0.9 branch.

        Show
        Daniel Dai added a comment - Patch committed to both trunk and 0.9 branch.
        Hide
        Thejas M Nair added a comment -

        +1

        Show
        Thejas M Nair added a comment - +1
        Hide
        Daniel Dai added a comment -

        PIG-2286-2.patch address Thejas's review comment.

        Show
        Daniel Dai added a comment - PIG-2286 -2.patch address Thejas's review comment.
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/1929/#review1974
        -----------------------------------------------------------

        trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/CombinerOptimizer.java
        <https://reviews.apache.org/r/1929/#comment4462>

        I think a comment will be useful -
        // The algebraic udf can have more than one input. Add the udf only once

        trunk/src/org/apache/pig/builtin/COR.java
        <https://reviews.apache.org/r/1929/#comment4463>

        The size of the tuple would need to be size*(size-1).
        Details -
        the inner loop is executed - (n-1) + (n-2) + .. (n - (n-1)) = n(n-1)/2 .
        Each time the inner loop is executed two columns are being added. So 2 * n(n-1)/2 = n(n-1)

        trunk/src/org/apache/pig/builtin/COR.java
        <https://reviews.apache.org/r/1929/#comment4464>

        I don't understand why the values are being added to a tuple as columns. That does not look right.

        • Thejas

        On 2011-09-16 18:11:08, Daniel Dai wrote:

        -----------------------------------------------------------

        This is an automatically generated e-mail. To reply, visit:

        https://reviews.apache.org/r/1929/

        -----------------------------------------------------------

        (Updated 2011-09-16 18:11:08)

        Review request for pig and Thejas Nair.

        Summary

        -------

        See PIG-2286

        This addresses bug PIG-2286.

        https://issues.apache.org/jira/browse/PIG-2286

        Diffs

        -----

        trunk/src/org/apache/pig/builtin/COR.java 1171325

        trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/CombinerOptimizer.java 1171325

        trunk/test/e2e/pig/tests/nightly.conf 1171325

        Diff: https://reviews.apache.org/r/1929/diff

        Testing

        -------

        Unit-test:

        all pass

        Piggybank-test:

        TestDBStorage fail for other reason, unrelated to patch

        Test-patch:

        [exec] +1 overall.

        [exec]

        [exec] +1 @author. The patch does not contain any @author tags.

        [exec]

        [exec] +1 tests included. The patch appears to include 3 new or modified tests.

        [exec]

        [exec] +1 javadoc. The javadoc tool did not generate any warning messages.

        [exec]

        [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        [exec]

        [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.

        [exec]

        [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.

        Thanks,

        Daniel

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1929/#review1974 ----------------------------------------------------------- trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/CombinerOptimizer.java < https://reviews.apache.org/r/1929/#comment4462 > I think a comment will be useful - // The algebraic udf can have more than one input. Add the udf only once trunk/src/org/apache/pig/builtin/COR.java < https://reviews.apache.org/r/1929/#comment4463 > The size of the tuple would need to be size*(size-1). Details - the inner loop is executed - (n-1) + (n-2) + .. (n - (n-1)) = n(n-1)/2 . Each time the inner loop is executed two columns are being added. So 2 * n(n-1)/2 = n(n-1) trunk/src/org/apache/pig/builtin/COR.java < https://reviews.apache.org/r/1929/#comment4464 > I don't understand why the values are being added to a tuple as columns. That does not look right. Thejas On 2011-09-16 18:11:08, Daniel Dai wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1929/ ----------------------------------------------------------- (Updated 2011-09-16 18:11:08) Review request for pig and Thejas Nair. Summary ------- See PIG-2286 This addresses bug PIG-2286 . https://issues.apache.org/jira/browse/PIG-2286 Diffs ----- trunk/src/org/apache/pig/builtin/COR.java 1171325 trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/CombinerOptimizer.java 1171325 trunk/test/e2e/pig/tests/nightly.conf 1171325 Diff: https://reviews.apache.org/r/1929/diff Testing ------- Unit-test: all pass Piggybank-test: TestDBStorage fail for other reason, unrelated to patch Test-patch: [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. Thanks, Daniel
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/1929/
        -----------------------------------------------------------

        Review request for pig and Thejas Nair.

        Summary
        -------

        See PIG-2286

        This addresses bug PIG-2286.
        https://issues.apache.org/jira/browse/PIG-2286

        Diffs


        trunk/src/org/apache/pig/builtin/COR.java 1171325
        trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/CombinerOptimizer.java 1171325
        trunk/test/e2e/pig/tests/nightly.conf 1171325

        Diff: https://reviews.apache.org/r/1929/diff

        Testing
        -------

        Unit-test:
        all pass

        Piggybank-test:
        TestDBStorage fail for other reason, unrelated to patch

        Test-patch:
        [exec] +1 overall.
        [exec]
        [exec] +1 @author. The patch does not contain any @author tags.
        [exec]
        [exec] +1 tests included. The patch appears to include 3 new or modified tests.
        [exec]
        [exec] +1 javadoc. The javadoc tool did not generate any warning messages.
        [exec]
        [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
        [exec]
        [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
        [exec]
        [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.

        Thanks,

        Daniel

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1929/ ----------------------------------------------------------- Review request for pig and Thejas Nair. Summary ------- See PIG-2286 This addresses bug PIG-2286 . https://issues.apache.org/jira/browse/PIG-2286 Diffs trunk/src/org/apache/pig/builtin/COR.java 1171325 trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/CombinerOptimizer.java 1171325 trunk/test/e2e/pig/tests/nightly.conf 1171325 Diff: https://reviews.apache.org/r/1929/diff Testing ------- Unit-test: all pass Piggybank-test: TestDBStorage fail for other reason, unrelated to patch Test-patch: [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. Thanks, Daniel
        Hide
        Daniel Dai added a comment -

        I discovered two issues:
        1. CombinerOptimizer does not handle UDF with two inputs
        2. COR Algebraic version seems not working

        I am working on a patch.

        Show
        Daniel Dai added a comment - I discovered two issues: 1. CombinerOptimizer does not handle UDF with two inputs 2. COR Algebraic version seems not working I am working on a patch.

          People

          • Assignee:
            Daniel Dai
            Reporter:
            Viraj Bhat
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development