Details

    • Type: New Feature New Feature
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:

      Description

      Computing aggregates over a cube of several dimensions is a common operation in data warehousing.

      The standard SQL syntax is "GROUP relation BY dim1, dim2, dim3 WITH CUBE" – which in addition to all dim1-2-3, produces aggregations for just dim1, just dim1 and dim2, etc. NULL is generally used to represent "all".

      A presentation by Arnab Nandi describes how one might implement efficient cubing in Map-Reduce here: http://pdf.cx/44wrk

      We can start with the naive solution which only works for algebraic measures, and work up from there.

      This is a candidate project for Google summer of code 2012. More information about the program can be found at https://cwiki.apache.org/confluence/display/PIG/GSoc2012

      1. PIG-2167.4.patch
        62 kB
        Prasanth J
      2. PIG-2167.3.patch
        59 kB
        Prasanth J
      3. PIG-2167.2.patch
        58 kB
        Prasanth J
      4. PIG-2167.1.patch
        57 kB
        Prasanth J
      5. Pig-Cubing-Performance.png
        20 kB
        Prasanth J

        Activity

        Hide
        Dmitriy V. Ryaboy added a comment -

        I believe there is value to providing the naive solution and improving on it later, rather than trying to build the optimal plan from the get-go.

        Initial (naive) implementation plan:

        Add an optional "WITH CUBE" clause to the group operator.

        In LogicalPlanBuilder, if "WITH CUBE" is present, insert operators equivalent to the following above the group operator:

        relation = foreach relation generate
           FLATTEN(CubeDimensions(dim1, dim2, dim3))
             as (dim1, dim2, dim3),
           other_fields;
        

        It may be desirable in some cases to group by a superset of dimensions one wants to cube on: group by dim1, dim2, dim3 with cube on (dim1, dim2). If we want to support that use case, we simply need to know to call the UDF on (dim1, dim2) and push dim3 into the other_fields list.

        Note also that there's a bit of a problem if null values are legitimate values for the dimensions, as we use null to indicate "all". The UDF provided in PIG-2168 allows one to use custom strings instead of null for the "all" marker. We can optionally support this in the grammar, as well.

        Show
        Dmitriy V. Ryaboy added a comment - I believe there is value to providing the naive solution and improving on it later, rather than trying to build the optimal plan from the get-go. Initial (naive) implementation plan: Add an optional "WITH CUBE" clause to the group operator. In LogicalPlanBuilder, if "WITH CUBE" is present, insert operators equivalent to the following above the group operator: relation = foreach relation generate FLATTEN(CubeDimensions(dim1, dim2, dim3)) as (dim1, dim2, dim3), other_fields; It may be desirable in some cases to group by a superset of dimensions one wants to cube on: group by dim1, dim2, dim3 with cube on (dim1, dim2). If we want to support that use case, we simply need to know to call the UDF on (dim1, dim2) and push dim3 into the other_fields list. Note also that there's a bit of a problem if null values are legitimate values for the dimensions, as we use null to indicate "all". The UDF provided in PIG-2168 allows one to use custom strings instead of null for the "all" marker. We can optionally support this in the grammar, as well.
        Hide
        Prasanth J added a comment -

        Hello everyone

        I am Prasanth Jayachandran, graduate student at The Ohio State University. I am working with Prof. Arnab Nandi for providing CUBE operator support in Pig. I am submitting the initial version of the CUBE operator implementation(naive version of cube materialization). As this is my first patch submission I am really excited about it and am hoping to continue my contribution for Apache Pig. Please review the attached patch and provide feedback from improvising it.

        Following contents explains the design decision and some initial performance numbers (experiments performed on single node pseudo-distributed hadoop setup).
        Pig syntax for Cubing
        CUBE rel BY (a,b,c);

        SQL/Oracle syntax for Cubing
        GROUP BY CUBE (a,b,c);

        CUBE operator internals
        The CUBE operator injects the logical plan for following operators
        x = FOREACH rel GENERATE FLATTEN(CubeDimensions(a,b,c));
        y = GROUP x by (a,b,c);

        What is the output schema of CUBE operator?
        {group: tuple(a,b,c), cube: bag{(dimensions::a,dimensions::b,dimensions::c)}}

        Why syntactically different from SQL/Oracle?

        • Easier to implement as it does not modify or break the existing GROUP BY operator implementation
        • CUBE operator might require separate flags for the following
        • Switching between BUC and STAR cubing (future optimization)
        • HAVING clause for monotonic operations
        • ROLLUP/DRILLDOWN operations
        • Hint the location of partially computed CUBE
        • user specified inputs (example: algebraic attribute for converting the holistic measure to partially algebraic measure can be specified by user)
        • Some operations applicable in GROUP operator are not applicable for CUBE
        • Constant expression evaluation
        • Duplicate column projection
        • Follows Pig language design principle of procedural simplicity

        Corner case handling
        Constant expressions can be provided in GROUP BY operator. Constant expressions support has been removed from CUBE BY operator grammar. If constant expressions are used with CUBE BY, FrontEndException is thrown.
        Duplicate column projection is supported in GROUP BY. Duplicates columns will be eliminated while generating logical plan. Current implementation ignores duplicates dimensions in CUBE BY. This can also be modified to throw exception if user repeats a dimension more than once.
        If cube dimensions are a subset of columns in input schema then the remaining columns in the input schema will be pushed to the “cube” bag.
        For example:
        inp = LOAD ‘/pig/data/input’ AS (a,b,c,d);
        x = CUBE inp BY (a,b);
        schema of x will be

        {group:tuple(a,b), cube:bag(dimensions::a,dimensions::b,c,d)}

        Performance
        Apache Pig Test Environment
        OS: Ubuntu 11.04 running as guest OS in Virtual Box
        CPU Cores: 2 (4 Threads)
        Memory: 8GB
        HDD: 100GB
        Mode: Single node pseudo-distributed mode setup running Hadoop-0.20.2.
        Configuration: Default configurations of hadoop
        SQL Server 2008 R2 Test Environment
        OS: Windows 7
        CPU Cores: 4 (8 Threads)
        Memory: 16GB
        HDD: 500GB

        Acknowledgements
        Professor Arnab Nandi, Department of Computer Science and Engineering, The Ohio State University, Columbus for guidance and assistance throughout the course of this initial implementation.
        Chaitanya Solarpurikar, Graduate Student, Department of Computer Science and Engineering, The Ohio State University, Columbus for setting up SQL server test environment and running performance comparison experiments.

        Show
        Prasanth J added a comment - Hello everyone I am Prasanth Jayachandran, graduate student at The Ohio State University. I am working with Prof. Arnab Nandi for providing CUBE operator support in Pig. I am submitting the initial version of the CUBE operator implementation(naive version of cube materialization). As this is my first patch submission I am really excited about it and am hoping to continue my contribution for Apache Pig. Please review the attached patch and provide feedback from improvising it. Following contents explains the design decision and some initial performance numbers (experiments performed on single node pseudo-distributed hadoop setup). Pig syntax for Cubing CUBE rel BY (a,b,c); SQL/Oracle syntax for Cubing GROUP BY CUBE (a,b,c); CUBE operator internals The CUBE operator injects the logical plan for following operators x = FOREACH rel GENERATE FLATTEN(CubeDimensions(a,b,c)); y = GROUP x by (a,b,c); What is the output schema of CUBE operator? {group: tuple(a,b,c), cube: bag{(dimensions::a,dimensions::b,dimensions::c)}} Why syntactically different from SQL/Oracle? Easier to implement as it does not modify or break the existing GROUP BY operator implementation CUBE operator might require separate flags for the following Switching between BUC and STAR cubing (future optimization) HAVING clause for monotonic operations ROLLUP/DRILLDOWN operations Hint the location of partially computed CUBE user specified inputs (example: algebraic attribute for converting the holistic measure to partially algebraic measure can be specified by user) Some operations applicable in GROUP operator are not applicable for CUBE Constant expression evaluation Duplicate column projection Follows Pig language design principle of procedural simplicity Corner case handling Constant expressions can be provided in GROUP BY operator. Constant expressions support has been removed from CUBE BY operator grammar. If constant expressions are used with CUBE BY, FrontEndException is thrown. Duplicate column projection is supported in GROUP BY. Duplicates columns will be eliminated while generating logical plan. Current implementation ignores duplicates dimensions in CUBE BY. This can also be modified to throw exception if user repeats a dimension more than once. If cube dimensions are a subset of columns in input schema then the remaining columns in the input schema will be pushed to the “cube” bag. For example: inp = LOAD ‘/pig/data/input’ AS (a,b,c,d); x = CUBE inp BY (a,b); schema of x will be {group:tuple(a,b), cube:bag(dimensions::a,dimensions::b,c,d)} Performance Apache Pig Test Environment OS: Ubuntu 11.04 running as guest OS in Virtual Box CPU Cores: 2 (4 Threads) Memory: 8GB HDD: 100GB Mode: Single node pseudo-distributed mode setup running Hadoop-0.20.2. Configuration: Default configurations of hadoop SQL Server 2008 R2 Test Environment OS: Windows 7 CPU Cores: 4 (8 Threads) Memory: 16GB HDD: 500GB Acknowledgements Professor Arnab Nandi, Department of Computer Science and Engineering, The Ohio State University, Columbus for guidance and assistance throughout the course of this initial implementation. Chaitanya Solarpurikar, Graduate Student, Department of Computer Science and Engineering, The Ohio State University, Columbus for setting up SQL server test environment and running performance comparison experiments.
        Hide
        Dmitriy V. Ryaboy added a comment -

        I wouldn't put too much weight on timing local mode. You are likely spending most of the time in JobControl's sleep(5000). We'll see about posting our patch for that.

        Show
        Dmitriy V. Ryaboy added a comment - I wouldn't put too much weight on timing local mode. You are likely spending most of the time in JobControl's sleep(5000). We'll see about posting our patch for that.
        Hide
        Dmitriy V. Ryaboy added a comment -

        Assigning to self as per apache guidelines, as I'd like to mentor this. Putting my application in right now.

        Show
        Dmitriy V. Ryaboy added a comment - Assigning to self as per apache guidelines, as I'd like to mentor this. Putting my application in right now.
        Hide
        Prasanth J added a comment -

        Hello Everyone
        I have added the proposal for cubing project at GSoC Proposal

        Dmitriy has already reviewed it. Made it public for comments from everyone.

        Please let me know your comments/feedback, so that I can incorporate the changes before deadline.

        Show
        Prasanth J added a comment - Hello Everyone I have added the proposal for cubing project at GSoC Proposal Dmitriy has already reviewed it. Made it public for comments from everyone. Please let me know your comments/feedback, so that I can incorporate the changes before deadline.
        Hide
        Prasanth J added a comment -

        The URL didn't redirect correctly. Pasting it again
        http://www.google-melange.com/gsoc/proposal/review/google/gsoc2012/pras_j/1

        Show
        Prasanth J added a comment - The URL didn't redirect correctly. Pasting it again http://www.google-melange.com/gsoc/proposal/review/google/gsoc2012/pras_j/1
        Hide
        Dmitriy V. Ryaboy added a comment -

        Prasanth, could you post your initial patch (the one already attached to this jira) to review board?

        Show
        Dmitriy V. Ryaboy added a comment - Prasanth, could you post your initial patch (the one already attached to this jira) to review board?
        Hide
        Prasanth J added a comment -

        Posted in review board https://reviews.apache.org/r/4670/

        Let me know in case if missed some details.

        Show
        Prasanth J added a comment - Posted in review board https://reviews.apache.org/r/4670/ Let me know in case if missed some details.
        Hide
        Prasanth J added a comment -

        Submitting patch(version 2) with updates based on code review comments by Dmitriy.

        Show
        Prasanth J added a comment - Submitting patch(version 2) with updates based on code review comments by Dmitriy.
        Hide
        Dmitriy V. Ryaboy added a comment -

        Prasanth, seems like you didn't update the RB?

        Makes it hard to see what changed between v.1 and v.2

        Show
        Dmitriy V. Ryaboy added a comment - Prasanth, seems like you didn't update the RB? Makes it hard to see what changed between v.1 and v.2
        Hide
        Prasanth J added a comment -

        Dmitriy,

        I have updated v2 of the patch to the RB just after replying to your review comments.

        Here is the link for the diff against the baseline
        https://reviews.apache.org/r/4670/diff/2/

        Here is the link for diff against Patch-1
        https://reviews.apache.org/r/4670/diff/1-2/#index_header

        Please let me know if I missed out something while updating.

        Show
        Prasanth J added a comment - Dmitriy, I have updated v2 of the patch to the RB just after replying to your review comments. Here is the link for the diff against the baseline https://reviews.apache.org/r/4670/diff/2/ Here is the link for diff against Patch-1 https://reviews.apache.org/r/4670/diff/1-2/#index_header Please let me know if I missed out something while updating.
        Hide
        Dmitriy V. Ryaboy added a comment -

        Quite right, I was looking at a different RB. Sorry bout that. Reviewing .

        Show
        Dmitriy V. Ryaboy added a comment - Quite right, I was looking at a different RB. Sorry bout that. Reviewing .
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/4670/
        -----------------------------------------------------------

        (Updated 2012-04-30 04:05:03.466616)

        Review request for pig and Dmitriy Ryaboy.

        Changes
        -------

        Updated JIRA issue field to reflect just the issue number.

        Summary
        -------

        This is a review board for https://issues.apache.org/jira/browse/PIG-2167

        This addresses bug PIG-2167.
        https://issues.apache.org/jira/browse/PIG-2167

        Diffs


        http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/optimizer/AllExpressionVisitor.java 1325115
        http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/optimizer/SchemaResetter.java 1325115
        http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/relational/LOCube.java PRE-CREATION
        http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/relational/LogicalRelationalNodesVisitor.java 1325115
        http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/visitor/ProjectStarExpander.java 1325115
        http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/AliasMasker.g 1325115
        http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/AstPrinter.g 1325115
        http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/AstValidator.g 1325115
        http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/LogicalPlanBuilder.java 1325115
        http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/LogicalPlanGenerator.g 1325115
        http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/QueryLexer.g 1325115
        http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/QueryParser.g 1325115
        http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/parser/TestLexer.pig 1325115
        http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/parser/TestLogicalPlanGenerator.java 1325115
        http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/parser/TestParser.pig 1325115
        http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/parser/TestQueryLexer.java 1325115
        http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/parser/TestQueryParser.java 1325115
        http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestCubeOperator.java PRE-CREATION

        Diff: https://reviews.apache.org/r/4670/diff

        Testing
        -------

        Unit tests: All passed

        Pre-commit tests: All passed
        ant clean test-commit

        Thanks,

        Prasanth_J

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4670/ ----------------------------------------------------------- (Updated 2012-04-30 04:05:03.466616) Review request for pig and Dmitriy Ryaboy. Changes ------- Updated JIRA issue field to reflect just the issue number. Summary ------- This is a review board for https://issues.apache.org/jira/browse/PIG-2167 This addresses bug PIG-2167 . https://issues.apache.org/jira/browse/PIG-2167 Diffs http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/optimizer/AllExpressionVisitor.java 1325115 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/optimizer/SchemaResetter.java 1325115 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/relational/LOCube.java PRE-CREATION http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/relational/LogicalRelationalNodesVisitor.java 1325115 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/visitor/ProjectStarExpander.java 1325115 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/AliasMasker.g 1325115 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/AstPrinter.g 1325115 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/AstValidator.g 1325115 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/LogicalPlanBuilder.java 1325115 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/LogicalPlanGenerator.g 1325115 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/QueryLexer.g 1325115 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/QueryParser.g 1325115 http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/parser/TestLexer.pig 1325115 http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/parser/TestLogicalPlanGenerator.java 1325115 http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/parser/TestParser.pig 1325115 http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/parser/TestQueryLexer.java 1325115 http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/parser/TestQueryParser.java 1325115 http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestCubeOperator.java PRE-CREATION Diff: https://reviews.apache.org/r/4670/diff Testing ------- Unit tests: All passed Pre-commit tests: All passed ant clean test-commit Thanks, Prasanth_J
        Hide
        jiraposter@reviews.apache.org added a comment -

        On 2012-04-30 00:54:08, Dmitriy Ryaboy wrote:

        > http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/relational/LOCube.java, line 74

        > <https://reviews.apache.org/r/4670/diff/1-2/?file=100633#file100633line74>

        >

        > I see, you cut this out in LogicalPlanGenerator so you assume this won't get called before the plan is generated. That'll probably work for now. We should definitely fix this in the next stage.

        Yeah. Will fix that in the next stage (implementing physical operator).

        On 2012-04-30 00:54:08, Dmitriy Ryaboy wrote:

        > http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/LogicalPlanBuilder.java, line 561

        > <https://reviews.apache.org/r/4670/diff/1-2/?file=100639#file100639line561>

        >

        > you can merge this try/catch with the one above.

        Done.

        On 2012-04-30 00:54:08, Dmitriy Ryaboy wrote:

        > http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestCubeOperator.java, line 58

        > <https://reviews.apache.org/r/4670/diff/1-2/?file=100648#file100648line58>

        >

        > try Util.createFile(String[] data)

        >

        > Even better, you can now use the new MockLoader from PIG-2650 (save a bit more time and complexity on the tests..)

        Updated. Now all test cases use MockLoader.

        On 2012-04-30 00:54:08, Dmitriy Ryaboy wrote:

        > http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestCubeOperator.java, line 302

        > <https://reviews.apache.org/r/4670/diff/1-2/?file=100648#file100648line302>

        >

        > return here

        Updated.

        On 2012-04-30 00:54:08, Dmitriy Ryaboy wrote:

        > http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestCubeOperator.java, line 318

        > <https://reviews.apache.org/r/4670/diff/1-2/?file=100648#file100648line318>

        >

        > Assert.fail("Expected to throw an exception when duplicate dimensions are detected!");

        >

        > (otherwise your test passes even if this doesn't work!)

        Updated.

        On 2012-04-30 00:54:08, Dmitriy Ryaboy wrote:

        > http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestCubeOperator.java, line 74

        > <https://reviews.apache.org/r/4670/diff/1-2/?file=100648#file100648line74>

        >

        > ExecMode.LOCAL iirc

        PigServer constructor accepts string ('local') or ExecType (ExecType.LOCAL). If its a string it internally converts to corresponding ExecType.

        • Prasanth_J

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/4670/#review7373
        -----------------------------------------------------------

        On 2012-04-30 04:05:03, Prasanth_J wrote:

        -----------------------------------------------------------

        This is an automatically generated e-mail. To reply, visit:

        https://reviews.apache.org/r/4670/

        -----------------------------------------------------------

        (Updated 2012-04-30 04:05:03)

        Review request for pig and Dmitriy Ryaboy.

        Summary

        -------

        This is a review board for https://issues.apache.org/jira/browse/PIG-2167

        This addresses bug PIG-2167.

        https://issues.apache.org/jira/browse/PIG-2167

        Diffs

        -----

        http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/optimizer/AllExpressionVisitor.java 1325115

        http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/optimizer/SchemaResetter.java 1325115

        http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/relational/LOCube.java PRE-CREATION

        http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/relational/LogicalRelationalNodesVisitor.java 1325115

        http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/visitor/ProjectStarExpander.java 1325115

        http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/AliasMasker.g 1325115

        http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/AstPrinter.g 1325115

        http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/AstValidator.g 1325115

        http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/LogicalPlanBuilder.java 1325115

        http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/LogicalPlanGenerator.g 1325115

        http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/QueryLexer.g 1325115

        http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/QueryParser.g 1325115

        http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/parser/TestLexer.pig 1325115

        http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/parser/TestLogicalPlanGenerator.java 1325115

        http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/parser/TestParser.pig 1325115

        http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/parser/TestQueryLexer.java 1325115

        http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/parser/TestQueryParser.java 1325115

        http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestCubeOperator.java PRE-CREATION

        Diff: https://reviews.apache.org/r/4670/diff

        Testing

        -------

        Unit tests: All passed

        Pre-commit tests: All passed

        ant clean test-commit

        Thanks,

        Prasanth_J

        Show
        jiraposter@reviews.apache.org added a comment - On 2012-04-30 00:54:08, Dmitriy Ryaboy wrote: > http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/relational/LOCube.java , line 74 > < https://reviews.apache.org/r/4670/diff/1-2/?file=100633#file100633line74 > > > I see, you cut this out in LogicalPlanGenerator so you assume this won't get called before the plan is generated. That'll probably work for now. We should definitely fix this in the next stage. Yeah. Will fix that in the next stage (implementing physical operator). On 2012-04-30 00:54:08, Dmitriy Ryaboy wrote: > http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/LogicalPlanBuilder.java , line 561 > < https://reviews.apache.org/r/4670/diff/1-2/?file=100639#file100639line561 > > > you can merge this try/catch with the one above. Done. On 2012-04-30 00:54:08, Dmitriy Ryaboy wrote: > http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestCubeOperator.java , line 58 > < https://reviews.apache.org/r/4670/diff/1-2/?file=100648#file100648line58 > > > try Util.createFile(String[] data) > > Even better, you can now use the new MockLoader from PIG-2650 (save a bit more time and complexity on the tests..) Updated. Now all test cases use MockLoader. On 2012-04-30 00:54:08, Dmitriy Ryaboy wrote: > http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestCubeOperator.java , line 302 > < https://reviews.apache.org/r/4670/diff/1-2/?file=100648#file100648line302 > > > return here Updated. On 2012-04-30 00:54:08, Dmitriy Ryaboy wrote: > http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestCubeOperator.java , line 318 > < https://reviews.apache.org/r/4670/diff/1-2/?file=100648#file100648line318 > > > Assert.fail("Expected to throw an exception when duplicate dimensions are detected!"); > > (otherwise your test passes even if this doesn't work!) Updated. On 2012-04-30 00:54:08, Dmitriy Ryaboy wrote: > http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestCubeOperator.java , line 74 > < https://reviews.apache.org/r/4670/diff/1-2/?file=100648#file100648line74 > > > ExecMode.LOCAL iirc PigServer constructor accepts string ('local') or ExecType (ExecType.LOCAL). If its a string it internally converts to corresponding ExecType. Prasanth_J ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4670/#review7373 ----------------------------------------------------------- On 2012-04-30 04:05:03, Prasanth_J wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4670/ ----------------------------------------------------------- (Updated 2012-04-30 04:05:03) Review request for pig and Dmitriy Ryaboy. Summary ------- This is a review board for https://issues.apache.org/jira/browse/PIG-2167 This addresses bug PIG-2167 . https://issues.apache.org/jira/browse/PIG-2167 Diffs ----- http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/optimizer/AllExpressionVisitor.java 1325115 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/optimizer/SchemaResetter.java 1325115 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/relational/LOCube.java PRE-CREATION http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/relational/LogicalRelationalNodesVisitor.java 1325115 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/visitor/ProjectStarExpander.java 1325115 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/AliasMasker.g 1325115 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/AstPrinter.g 1325115 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/AstValidator.g 1325115 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/LogicalPlanBuilder.java 1325115 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/LogicalPlanGenerator.g 1325115 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/QueryLexer.g 1325115 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/QueryParser.g 1325115 http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/parser/TestLexer.pig 1325115 http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/parser/TestLogicalPlanGenerator.java 1325115 http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/parser/TestParser.pig 1325115 http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/parser/TestQueryLexer.java 1325115 http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/parser/TestQueryParser.java 1325115 http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestCubeOperator.java PRE-CREATION Diff: https://reviews.apache.org/r/4670/diff Testing ------- Unit tests: All passed Pre-commit tests: All passed ant clean test-commit Thanks, Prasanth_J
        Hide
        Prasanth J added a comment -

        I am not able to attach the patch(created using git diff) to review board. So attaching it here.

        Show
        Prasanth J added a comment - I am not able to attach the patch(created using git diff) to review board. So attaching it here.
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/4670/#review7484
        -----------------------------------------------------------

        This review took a little time as I got interrupted in the middle.
        Great job overall.
        Some comments are about refactoring a little part of the code your modifying.
        Let me know what you think.

        http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/optimizer/AllExpressionVisitor.java
        <https://reviews.apache.org/r/4670/#comment16557>

        iterate on entries() instead

        http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/optimizer/AllExpressionVisitor.java
        <https://reviews.apache.org/r/4670/#comment16561>

        All those for loops could be factored into a private visitAll(Collection<LogicalExpressionPlan>)

        http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/optimizer/SchemaResetter.java
        <https://reviews.apache.org/r/4670/#comment16562>

        same comment.
        You can factor out the for loops in those methods.

        http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/relational/LOCube.java
        <https://reviews.apache.org/r/4670/#comment16563>

        chain all exceptions

        http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/relational/LOCube.java
        <https://reviews.apache.org/r/4670/#comment16564>

        same

        http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/AstValidator.g
        <https://reviews.apache.org/r/4670/#comment16566>

        do you know if there's a way to avoid duplication across .g files?

        http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/LogicalPlanBuilder.java
        <https://reviews.apache.org/r/4670/#comment16567>

        you can make this a javadoc comment

        http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/LogicalPlanBuilder.java
        <https://reviews.apache.org/r/4670/#comment16568>

        this is not affected to anything. Is it intended?

        http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/LogicalPlanBuilder.java
        <https://reviews.apache.org/r/4670/#comment16569>

        same comment. I guess it is intended, but it looks strange. If the constructor has side effect, maybe there should be another way

        http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/LogicalPlanBuilder.java
        <https://reviews.apache.org/r/4670/#comment16776>

        add the name of the duplicate in the error message

        http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/LogicalPlanGenerator.g
        <https://reviews.apache.org/r/4670/#comment16777>

        please add some comments to explain what you're doing here

        http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestCubeOperator.java
        <https://reviews.apache.org/r/4670/#comment16778>

        you van now use mock.Storage() for those tests.
        see PIG-2650

        http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestCubeOperator.java
        <https://reviews.apache.org/r/4670/#comment16779>

        Use the message part of the assert statement to ensure a good error message when it fails.

        Tuple t = it.next()
        assertTrue(expected+" contains "+t, expected.contains(t))

        • Julien

        On 2012-04-30 04:05:03, Prasanth_J wrote:

        -----------------------------------------------------------

        This is an automatically generated e-mail. To reply, visit:

        https://reviews.apache.org/r/4670/

        -----------------------------------------------------------

        (Updated 2012-04-30 04:05:03)

        Review request for pig and Dmitriy Ryaboy.

        Summary

        -------

        This is a review board for https://issues.apache.org/jira/browse/PIG-2167

        This addresses bug PIG-2167.

        https://issues.apache.org/jira/browse/PIG-2167

        Diffs

        -----

        http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/optimizer/AllExpressionVisitor.java 1325115

        http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/optimizer/SchemaResetter.java 1325115

        http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/relational/LOCube.java PRE-CREATION

        http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/relational/LogicalRelationalNodesVisitor.java 1325115

        http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/visitor/ProjectStarExpander.java 1325115

        http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/AliasMasker.g 1325115

        http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/AstPrinter.g 1325115

        http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/AstValidator.g 1325115

        http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/LogicalPlanBuilder.java 1325115

        http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/LogicalPlanGenerator.g 1325115

        http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/QueryLexer.g 1325115

        http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/QueryParser.g 1325115

        http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/parser/TestLexer.pig 1325115

        http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/parser/TestLogicalPlanGenerator.java 1325115

        http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/parser/TestParser.pig 1325115

        http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/parser/TestQueryLexer.java 1325115

        http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/parser/TestQueryParser.java 1325115

        http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestCubeOperator.java PRE-CREATION

        Diff: https://reviews.apache.org/r/4670/diff

        Testing

        -------

        Unit tests: All passed

        Pre-commit tests: All passed

        ant clean test-commit

        Thanks,

        Prasanth_J

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4670/#review7484 ----------------------------------------------------------- This review took a little time as I got interrupted in the middle. Great job overall. Some comments are about refactoring a little part of the code your modifying. Let me know what you think. http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/optimizer/AllExpressionVisitor.java < https://reviews.apache.org/r/4670/#comment16557 > iterate on entries() instead http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/optimizer/AllExpressionVisitor.java < https://reviews.apache.org/r/4670/#comment16561 > All those for loops could be factored into a private visitAll(Collection<LogicalExpressionPlan>) http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/optimizer/SchemaResetter.java < https://reviews.apache.org/r/4670/#comment16562 > same comment. You can factor out the for loops in those methods. http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/relational/LOCube.java < https://reviews.apache.org/r/4670/#comment16563 > chain all exceptions http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/relational/LOCube.java < https://reviews.apache.org/r/4670/#comment16564 > same http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/AstValidator.g < https://reviews.apache.org/r/4670/#comment16566 > do you know if there's a way to avoid duplication across .g files? http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/LogicalPlanBuilder.java < https://reviews.apache.org/r/4670/#comment16567 > you can make this a javadoc comment http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/LogicalPlanBuilder.java < https://reviews.apache.org/r/4670/#comment16568 > this is not affected to anything. Is it intended? http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/LogicalPlanBuilder.java < https://reviews.apache.org/r/4670/#comment16569 > same comment. I guess it is intended, but it looks strange. If the constructor has side effect, maybe there should be another way http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/LogicalPlanBuilder.java < https://reviews.apache.org/r/4670/#comment16776 > add the name of the duplicate in the error message http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/LogicalPlanGenerator.g < https://reviews.apache.org/r/4670/#comment16777 > please add some comments to explain what you're doing here http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestCubeOperator.java < https://reviews.apache.org/r/4670/#comment16778 > you van now use mock.Storage() for those tests. see PIG-2650 http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestCubeOperator.java < https://reviews.apache.org/r/4670/#comment16779 > Use the message part of the assert statement to ensure a good error message when it fails. Tuple t = it.next() assertTrue(expected+" contains "+t, expected.contains(t)) Julien On 2012-04-30 04:05:03, Prasanth_J wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4670/ ----------------------------------------------------------- (Updated 2012-04-30 04:05:03) Review request for pig and Dmitriy Ryaboy. Summary ------- This is a review board for https://issues.apache.org/jira/browse/PIG-2167 This addresses bug PIG-2167 . https://issues.apache.org/jira/browse/PIG-2167 Diffs ----- http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/optimizer/AllExpressionVisitor.java 1325115 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/optimizer/SchemaResetter.java 1325115 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/relational/LOCube.java PRE-CREATION http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/relational/LogicalRelationalNodesVisitor.java 1325115 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/visitor/ProjectStarExpander.java 1325115 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/AliasMasker.g 1325115 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/AstPrinter.g 1325115 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/AstValidator.g 1325115 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/LogicalPlanBuilder.java 1325115 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/LogicalPlanGenerator.g 1325115 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/QueryLexer.g 1325115 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/QueryParser.g 1325115 http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/parser/TestLexer.pig 1325115 http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/parser/TestLogicalPlanGenerator.java 1325115 http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/parser/TestParser.pig 1325115 http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/parser/TestQueryLexer.java 1325115 http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/parser/TestQueryParser.java 1325115 http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestCubeOperator.java PRE-CREATION Diff: https://reviews.apache.org/r/4670/diff Testing ------- Unit tests: All passed Pre-commit tests: All passed ant clean test-commit Thanks, Prasanth_J
        Hide
        jiraposter@reviews.apache.org added a comment -

        On 2012-05-05 00:09:48, Julien Le Dem wrote:

        > This review took a little time as I got interrupted in the middle.

        > Great job overall.

        > Some comments are about refactoring a little part of the code your modifying.

        > Let me know what you think.

        Thanks for the review julien. I made patch v3 based on Dmitriy's review comments but was not able to upload in RB because git diff's are compatible with RB. I have incorporated most of your review comments in this patch (v4).
        Please check my comments below.

        On 2012-05-05 00:09:48, Julien Le Dem wrote:

        > http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/optimizer/AllExpressionVisitor.java, line 98

        > <https://reviews.apache.org/r/4670/diff/2/?file=101421#file101421line98>

        >

        > iterate on entries() instead

        The MultiMap class under org.apache.pig.impl.util package doesn't seem to support iteration using entrySet.

        On 2012-05-05 00:09:48, Julien Le Dem wrote:

        > http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/optimizer/AllExpressionVisitor.java, lines 100-103

        > <https://reviews.apache.org/r/4670/diff/2/?file=101421#file101421line100>

        >

        > All those for loops could be factored into a private visitAll(Collection<LogicalExpressionPlan>)

        Good catch. Added new private method and refactored all occurrences of for loop to use this method.

        On 2012-05-05 00:09:48, Julien Le Dem wrote:

        > http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/optimizer/SchemaResetter.java, lines 137-140

        > <https://reviews.apache.org/r/4670/diff/2/?file=101422#file101422line137>

        >

        > same comment.

        > You can factor out the for loops in those methods.

        Added new private method.

        On 2012-05-05 00:09:48, Julien Le Dem wrote:

        > http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/relational/LOCube.java, line 78

        > <https://reviews.apache.org/r/4670/diff/2/?file=101423#file101423line78>

        >

        > chain all exceptions

        Modified.

        On 2012-05-05 00:09:48, Julien Le Dem wrote:

        > http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/LogicalPlanBuilder.java, line 397

        > <https://reviews.apache.org/r/4670/diff/2/?file=101429#file101429line397>

        >

        > you can make this a javadoc comment

        is javadoc generated for private methods as well? Since this is a private method I kept it as a simple comment.

        On 2012-05-05 00:09:48, Julien Le Dem wrote:

        > http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/LogicalPlanBuilder.java, line 608

        > <https://reviews.apache.org/r/4670/diff/2/?file=101429#file101429line608>

        >

        > add the name of the duplicate in the error message

        Added.

        On 2012-05-05 00:09:48, Julien Le Dem wrote:

        > http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestCubeOperator.java, line 48

        > <https://reviews.apache.org/r/4670/diff/2/?file=101438#file101438line48>

        >

        > you van now use mock.Storage() for those tests.

        > see PIG-2650

        Updated from patch v3 onwards.

        On 2012-05-05 00:09:48, Julien Le Dem wrote:

        > http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestCubeOperator.java, line 139

        > <https://reviews.apache.org/r/4670/diff/2/?file=101438#file101438line139>

        >

        > Use the message part of the assert statement to ensure a good error message when it fails.

        >

        > Tuple t = it.next()

        > assertTrue(expected+" contains "+t, expected.contains(t))

        Updated.

        On 2012-05-05 00:09:48, Julien Le Dem wrote:

        > http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/LogicalPlanBuilder.java, line 522

        > <https://reviews.apache.org/r/4670/diff/2/?file=101429#file101429line522>

        >

        > same comment. I guess it is intended, but it looks strange. If the constructor has side effect, maybe there should be another way

        This inserts a project expression to logical expression plan (lEplan in this case). I found this way in other files too. I could not find a better way to attach a project expression to a logical expression plan. Do you have any other thoughts/ways for doing this?

        On 2012-05-05 00:09:48, Julien Le Dem wrote:

        > http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/LogicalPlanGenerator.g, line 469

        > <https://reviews.apache.org/r/4670/diff/2/?file=101430#file101430line469>

        >

        > please add some comments to explain what you're doing here

        Updated.

        On 2012-05-05 00:09:48, Julien Le Dem wrote:

        > http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/AstValidator.g, lines 259-273

        > <https://reviews.apache.org/r/4670/diff/2/?file=101428#file101428line259>

        >

        > do you know if there's a way to avoid duplication across .g files?

        Since all these grammar files serves different purposes, I am not sure if it is possible to avoid duplicates.

        • Prasanth_J

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/4670/#review7484
        -----------------------------------------------------------

        On 2012-04-30 04:05:03, Prasanth_J wrote:

        -----------------------------------------------------------

        This is an automatically generated e-mail. To reply, visit:

        https://reviews.apache.org/r/4670/

        -----------------------------------------------------------

        (Updated 2012-04-30 04:05:03)

        Review request for pig and Dmitriy Ryaboy.

        Summary

        -------

        This is a review board for https://issues.apache.org/jira/browse/PIG-2167

        This addresses bug PIG-2167.

        https://issues.apache.org/jira/browse/PIG-2167

        Diffs

        -----

        http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/optimizer/AllExpressionVisitor.java 1325115

        http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/optimizer/SchemaResetter.java 1325115

        http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/relational/LOCube.java PRE-CREATION

        http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/relational/LogicalRelationalNodesVisitor.java 1325115

        http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/visitor/ProjectStarExpander.java 1325115

        http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/AliasMasker.g 1325115

        http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/AstPrinter.g 1325115

        http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/AstValidator.g 1325115

        http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/LogicalPlanBuilder.java 1325115

        http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/LogicalPlanGenerator.g 1325115

        http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/QueryLexer.g 1325115

        http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/QueryParser.g 1325115

        http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/parser/TestLexer.pig 1325115

        http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/parser/TestLogicalPlanGenerator.java 1325115

        http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/parser/TestParser.pig 1325115

        http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/parser/TestQueryLexer.java 1325115

        http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/parser/TestQueryParser.java 1325115

        http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestCubeOperator.java PRE-CREATION

        Diff: https://reviews.apache.org/r/4670/diff

        Testing

        -------

        Unit tests: All passed

        Pre-commit tests: All passed

        ant clean test-commit

        Thanks,

        Prasanth_J

        Show
        jiraposter@reviews.apache.org added a comment - On 2012-05-05 00:09:48, Julien Le Dem wrote: > This review took a little time as I got interrupted in the middle. > Great job overall. > Some comments are about refactoring a little part of the code your modifying. > Let me know what you think. Thanks for the review julien. I made patch v3 based on Dmitriy's review comments but was not able to upload in RB because git diff's are compatible with RB. I have incorporated most of your review comments in this patch (v4). Please check my comments below. On 2012-05-05 00:09:48, Julien Le Dem wrote: > http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/optimizer/AllExpressionVisitor.java , line 98 > < https://reviews.apache.org/r/4670/diff/2/?file=101421#file101421line98 > > > iterate on entries() instead The MultiMap class under org.apache.pig.impl.util package doesn't seem to support iteration using entrySet. On 2012-05-05 00:09:48, Julien Le Dem wrote: > http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/optimizer/AllExpressionVisitor.java , lines 100-103 > < https://reviews.apache.org/r/4670/diff/2/?file=101421#file101421line100 > > > All those for loops could be factored into a private visitAll(Collection<LogicalExpressionPlan>) Good catch. Added new private method and refactored all occurrences of for loop to use this method. On 2012-05-05 00:09:48, Julien Le Dem wrote: > http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/optimizer/SchemaResetter.java , lines 137-140 > < https://reviews.apache.org/r/4670/diff/2/?file=101422#file101422line137 > > > same comment. > You can factor out the for loops in those methods. Added new private method. On 2012-05-05 00:09:48, Julien Le Dem wrote: > http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/relational/LOCube.java , line 78 > < https://reviews.apache.org/r/4670/diff/2/?file=101423#file101423line78 > > > chain all exceptions Modified. On 2012-05-05 00:09:48, Julien Le Dem wrote: > http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/LogicalPlanBuilder.java , line 397 > < https://reviews.apache.org/r/4670/diff/2/?file=101429#file101429line397 > > > you can make this a javadoc comment is javadoc generated for private methods as well? Since this is a private method I kept it as a simple comment. On 2012-05-05 00:09:48, Julien Le Dem wrote: > http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/LogicalPlanBuilder.java , line 608 > < https://reviews.apache.org/r/4670/diff/2/?file=101429#file101429line608 > > > add the name of the duplicate in the error message Added. On 2012-05-05 00:09:48, Julien Le Dem wrote: > http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestCubeOperator.java , line 48 > < https://reviews.apache.org/r/4670/diff/2/?file=101438#file101438line48 > > > you van now use mock.Storage() for those tests. > see PIG-2650 Updated from patch v3 onwards. On 2012-05-05 00:09:48, Julien Le Dem wrote: > http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestCubeOperator.java , line 139 > < https://reviews.apache.org/r/4670/diff/2/?file=101438#file101438line139 > > > Use the message part of the assert statement to ensure a good error message when it fails. > > Tuple t = it.next() > assertTrue(expected+" contains "+t, expected.contains(t)) Updated. On 2012-05-05 00:09:48, Julien Le Dem wrote: > http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/LogicalPlanBuilder.java , line 522 > < https://reviews.apache.org/r/4670/diff/2/?file=101429#file101429line522 > > > same comment. I guess it is intended, but it looks strange. If the constructor has side effect, maybe there should be another way This inserts a project expression to logical expression plan (lEplan in this case). I found this way in other files too. I could not find a better way to attach a project expression to a logical expression plan. Do you have any other thoughts/ways for doing this? On 2012-05-05 00:09:48, Julien Le Dem wrote: > http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/LogicalPlanGenerator.g , line 469 > < https://reviews.apache.org/r/4670/diff/2/?file=101430#file101430line469 > > > please add some comments to explain what you're doing here Updated. On 2012-05-05 00:09:48, Julien Le Dem wrote: > http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/AstValidator.g , lines 259-273 > < https://reviews.apache.org/r/4670/diff/2/?file=101428#file101428line259 > > > do you know if there's a way to avoid duplication across .g files? Since all these grammar files serves different purposes, I am not sure if it is possible to avoid duplicates. Prasanth_J ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4670/#review7484 ----------------------------------------------------------- On 2012-04-30 04:05:03, Prasanth_J wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4670/ ----------------------------------------------------------- (Updated 2012-04-30 04:05:03) Review request for pig and Dmitriy Ryaboy. Summary ------- This is a review board for https://issues.apache.org/jira/browse/PIG-2167 This addresses bug PIG-2167 . https://issues.apache.org/jira/browse/PIG-2167 Diffs ----- http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/optimizer/AllExpressionVisitor.java 1325115 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/optimizer/SchemaResetter.java 1325115 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/relational/LOCube.java PRE-CREATION http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/relational/LogicalRelationalNodesVisitor.java 1325115 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/visitor/ProjectStarExpander.java 1325115 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/AliasMasker.g 1325115 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/AstPrinter.g 1325115 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/AstValidator.g 1325115 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/LogicalPlanBuilder.java 1325115 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/LogicalPlanGenerator.g 1325115 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/QueryLexer.g 1325115 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/QueryParser.g 1325115 http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/parser/TestLexer.pig 1325115 http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/parser/TestLogicalPlanGenerator.java 1325115 http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/parser/TestParser.pig 1325115 http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/parser/TestQueryLexer.java 1325115 http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/parser/TestQueryParser.java 1325115 http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestCubeOperator.java PRE-CREATION Diff: https://reviews.apache.org/r/4670/diff Testing ------- Unit tests: All passed Pre-commit tests: All passed ant clean test-commit Thanks, Prasanth_J
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/4670/
        -----------------------------------------------------------

        (Updated 2012-05-06 00:13:16.285229)

        Review request for pig and Dmitriy Ryaboy.

        Changes
        -------

        Incorporated Dmitriy's and Julien's review comments.

        Unit tests: All passed

        Pre-commit tests: All passed
        ant clean test-commit

        Summary
        -------

        This is a review board for https://issues.apache.org/jira/browse/PIG-2167

        This addresses bug PIG-2167.
        https://issues.apache.org/jira/browse/PIG-2167

        Diffs (updated)


        http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/parser/TestLogicalPlanGenerator.java 1334534
        http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/parser/TestParser.pig 1334534
        http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/parser/TestQueryLexer.java 1334534
        http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/parser/TestQueryParser.java 1334534
        http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestCubeOperator.java PRE-CREATION
        http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/AstPrinter.g 1334534
        http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/AstValidator.g 1334534
        http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/LogicalPlanBuilder.java 1334534
        http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/LogicalPlanGenerator.g 1334534
        http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/QueryLexer.g 1334534
        http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/QueryParser.g 1334534
        http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/parser/TestLexer.pig 1334534
        http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/relational/LOCube.java PRE-CREATION
        http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/relational/LogicalRelationalNodesVisitor.java 1334534
        http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/visitor/ProjectStarExpander.java 1334534
        http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/AliasMasker.g 1334534
        http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/optimizer/SchemaResetter.java 1334534
        http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/optimizer/AllExpressionVisitor.java 1334534

        Diff: https://reviews.apache.org/r/4670/diff

        Testing
        -------

        Unit tests: All passed

        Pre-commit tests: All passed
        ant clean test-commit

        Thanks,

        Prasanth_J

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4670/ ----------------------------------------------------------- (Updated 2012-05-06 00:13:16.285229) Review request for pig and Dmitriy Ryaboy. Changes ------- Incorporated Dmitriy's and Julien's review comments. Unit tests: All passed Pre-commit tests: All passed ant clean test-commit Summary ------- This is a review board for https://issues.apache.org/jira/browse/PIG-2167 This addresses bug PIG-2167 . https://issues.apache.org/jira/browse/PIG-2167 Diffs (updated) http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/parser/TestLogicalPlanGenerator.java 1334534 http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/parser/TestParser.pig 1334534 http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/parser/TestQueryLexer.java 1334534 http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/parser/TestQueryParser.java 1334534 http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestCubeOperator.java PRE-CREATION http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/AstPrinter.g 1334534 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/AstValidator.g 1334534 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/LogicalPlanBuilder.java 1334534 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/LogicalPlanGenerator.g 1334534 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/QueryLexer.g 1334534 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/QueryParser.g 1334534 http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/parser/TestLexer.pig 1334534 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/relational/LOCube.java PRE-CREATION http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/relational/LogicalRelationalNodesVisitor.java 1334534 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/visitor/ProjectStarExpander.java 1334534 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/AliasMasker.g 1334534 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/optimizer/SchemaResetter.java 1334534 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/optimizer/AllExpressionVisitor.java 1334534 Diff: https://reviews.apache.org/r/4670/diff Testing ------- Unit tests: All passed Pre-commit tests: All passed ant clean test-commit Thanks, Prasanth_J
        Hide
        Prasanth J added a comment -

        Uploading patch with Julien's review comments incorporated.

        Modifications:
        1) Testing uses mock.Storage()
        2) Chaining exceptions
        3) Refactored iterative visits to a separate private method in logical optimizers

        Show
        Prasanth J added a comment - Uploading patch with Julien's review comments incorporated. Modifications: 1) Testing uses mock.Storage() 2) Chaining exceptions 3) Refactored iterative visits to a separate private method in logical optimizers
        Hide
        Prasanth J added a comment -

        Hello everyone

        I will be working with Dmitriy on this issue for GSoC 2012 which is about to start next week (May 21). Before getting started with the project I would like to hear from the community about the syntax for CUBE (and related) operators. As a part of this project I also have plans for implementing hierarchy based grouping, partial grouping and monotonicity based pruning using HAVING clause.

        Following are the list of tasks that require modifications to the grammar file and for each of them I have included SQL/Oracle syntax, the proposed syntax for pig and the corresponding output groups. I would like to know feedback for each of the suggested syntax and finalize the syntax for all of them.

        1) CUBE operation (all combinations of grouping)
        SQL/Oracle syntax

        GROUP BY CUBE (a,b,c);

        Current syntax in Pig

        alias = CUBE rel BY (a,b,c);

        Output groups:

        {(a,b,c), (a,b), (b,c), (a,c), (a), (b), (c), ()}

        2) ROLLUP operation (hierarchy based grouping)
        SQL/Oracle syntax

        GROUP BY ROLLUP (a,b,c);

        Proposed syntax for Pig:

        alias = CUBE rel BY ROLLUP(a,b,c);

        Output groups:

        {(a,b,c), (a,b), (a), ()}

        3) GROUPING SETS operation (partial grouping)
        SQL/Oracle syntax

        GROUP BY GROUPING SETS ((a,b),(b),());

        Proposed syntax for Pig:

        alias = CUBE rel BY GROUPING SETS((a,b),(b),());

        Output groups:

        {(a,b), (b), ()}

        4) HAVING clause (for pruning the groups that does not satisfy the specified condition)
        SQL/Oracle syntax:

        GROUP BY CUBE (x,y)  
        HAVING SUM(z)>100

        Proposed syntax for Pig:

        alias = CUBE rel BY (a,b,c) HAVING SUM(d)>100;

        Output groups:
        Only the groups that satisfy the specified condition appears in the output

        The primary reasons for having a separate CUBE operator are

        • Ease of implementation without modification to existing GROUP BY operator implementation
        • Ability to have a separate Physical Operator (POCube)
        • Cube specific optimizations can be applied to the physical operator
        • Some operations applicable to GROUP operator are not applicable for CUBE
          • Constant expression evaluation
          • Duplicate column projection

        Please let me know your feedback/suggestions for the operator syntax.

        References:

        Show
        Prasanth J added a comment - Hello everyone I will be working with Dmitriy on this issue for GSoC 2012 which is about to start next week (May 21). Before getting started with the project I would like to hear from the community about the syntax for CUBE (and related) operators. As a part of this project I also have plans for implementing hierarchy based grouping, partial grouping and monotonicity based pruning using HAVING clause. Following are the list of tasks that require modifications to the grammar file and for each of them I have included SQL/Oracle syntax, the proposed syntax for pig and the corresponding output groups. I would like to know feedback for each of the suggested syntax and finalize the syntax for all of them. 1) CUBE operation (all combinations of grouping) SQL/Oracle syntax GROUP BY CUBE (a,b,c); Current syntax in Pig alias = CUBE rel BY (a,b,c); Output groups: {(a,b,c), (a,b), (b,c), (a,c), (a), (b), (c), ()} 2) ROLLUP operation (hierarchy based grouping) SQL/Oracle syntax GROUP BY ROLLUP (a,b,c); Proposed syntax for Pig: alias = CUBE rel BY ROLLUP(a,b,c); Output groups: {(a,b,c), (a,b), (a), ()} 3) GROUPING SETS operation (partial grouping) SQL/Oracle syntax GROUP BY GROUPING SETS ((a,b),(b),()); Proposed syntax for Pig: alias = CUBE rel BY GROUPING SETS((a,b),(b),()); Output groups: {(a,b), (b), ()} 4) HAVING clause (for pruning the groups that does not satisfy the specified condition) SQL/Oracle syntax: GROUP BY CUBE (x,y) HAVING SUM(z)>100 Proposed syntax for Pig: alias = CUBE rel BY (a,b,c) HAVING SUM(d)>100; Output groups: Only the groups that satisfy the specified condition appears in the output The primary reasons for having a separate CUBE operator are Ease of implementation without modification to existing GROUP BY operator implementation Ability to have a separate Physical Operator (POCube) Cube specific optimizations can be applied to the physical operator Some operations applicable to GROUP operator are not applicable for CUBE Constant expression evaluation Duplicate column projection Please let me know your feedback/suggestions for the operator syntax. References: Oracle documentation for CUBE http://docs.oracle.com/cd/B19306_01/server.102/b14223/aggreg.htm#i1012749 SQL server documentation for CUBE http://msdn.microsoft.com/en-us/library/ms177673.aspx Simple CUBE/ROLLUP operations in Oracle and SQL http://sqlfiddle.com/#!3/3bba2/11 http://sqlfiddle.com/#!4/32c81/26
        Hide
        Dmitriy V. Ryaboy added a comment -

        All looks good except for HAVING. The reason SQL has a HAVING clause is that it doesn't have a way to express post-group filtering without using subselects. In Pig Latin, we do – it's just a FILTER clause that follows your grouping clause. So the same effect can be achieved:

        alias = CUBE rel BY (a,b,c);
        filtered_alias = FILTER alias BY SUM(rel.d) > 10;
        

        Note that this assumes the schema of alias is (group, rel) where "rel" is a complex tuple representing a matching row from the original relation (the same way the GROUP operator currently works).

        D

        Show
        Dmitriy V. Ryaboy added a comment - All looks good except for HAVING. The reason SQL has a HAVING clause is that it doesn't have a way to express post-group filtering without using subselects. In Pig Latin, we do – it's just a FILTER clause that follows your grouping clause. So the same effect can be achieved: alias = CUBE rel BY (a,b,c); filtered_alias = FILTER alias BY SUM(rel.d) > 10; Note that this assumes the schema of alias is (group, rel) where "rel" is a complex tuple representing a matching row from the original relation (the same way the GROUP operator currently works). D
        Hide
        Dmitriy V. Ryaboy added a comment -

        Naive Cube implementation looks good. +1, will commit.

        Let's create subtickets for incremental improvements as you have outlined them in the GSoC proposal, and continue discussion in those. Might as well create a subticket for the naive one (otherwise I have nothing to close out ).

        Show
        Dmitriy V. Ryaboy added a comment - Naive Cube implementation looks good. +1, will commit. Let's create subtickets for incremental improvements as you have outlined them in the GSoC proposal, and continue discussion in those. Might as well create a subticket for the naive one (otherwise I have nothing to close out ).
        Hide
        Russell Jurney added a comment -

        Playing with this now, applied it to 0.10. Thanks!

        Show
        Russell Jurney added a comment - Playing with this now, applied it to 0.10. Thanks!
        Hide
        Prasanth J added a comment -

        Russell.. Try using the patch in JIRA-2170. I uploaded a new one yesterday which fixes a exception with illustrate operator..

        Show
        Prasanth J added a comment - Russell.. Try using the patch in JIRA-2170. I uploaded a new one yesterday which fixes a exception with illustrate operator..
        Hide
        Russell Jurney added a comment -

        Thanks prasanth, but which JIRA do you mean? PIG-2170 hasn't been updated since august?

        Show
        Russell Jurney added a comment - Thanks prasanth, but which JIRA do you mean? PIG-2170 hasn't been updated since august?
        Hide
        Prasanth J added a comment -

        Oops. My bad. Its PIG-2710 actually.

        Show
        Prasanth J added a comment - Oops. My bad. Its PIG-2710 actually.
        Hide
        zhchbin added a comment -

        Hi Prasanth,
        I'm interesting in data cube and map reduce. So I am reading the source code of CUBE operation and the associated paper.
        In the paper <Distributed Cube Materialization on Holistic Measures>, it mentions "batch areas", have you implement it here?
        If "batch areas" has been implemented, could you please tell me where is it? I can't find it. Thank you so much.

        Show
        zhchbin added a comment - Hi Prasanth, I'm interesting in data cube and map reduce. So I am reading the source code of CUBE operation and the associated paper. In the paper <Distributed Cube Materialization on Holistic Measures>, it mentions "batch areas", have you implement it here? If "batch areas" has been implemented, could you please tell me where is it? I can't find it. Thank you so much.
        Hide
        Prasanth J added a comment -

        zhchbin Thanks for your interest. The cube work in Pig had many parts to it, PIG-2167 adds the syntax for CUBE and ROLLUP operation. It also rewrites the logical plan to insert the cubing and rollup udfs. PIG-2167 is just a syntactic sugar over the UDFs. The main idea of the MRCube algorithm is to handle holistic measures (PIG-2831). PIG-2831 patch is way too outdated and is not reviewed yet. So there might be many rough edges. PIG-2831 does the following
        1) Identify holistic measure and insert logical cube/rollup operators with some additional information
        2) Identify algebraic attributes (partitioning attribute)
        3) Inserts sampling job into MRPlan (added new sampling algorithm that reads N records randomly without reading the entire file)
        4) Added new intermediate storage formats that can write/read statistics
        5) Inserts post-processing job to aggregate the partitioned results

        All the above steps are required for value-partitioning algorithm for data distribution. It does not implement the batch-area identification (distributing the computation) as mentioned in the paper. It might be worthwhile to start off with value partitioning followed by batch areas and other optimizations like BUC etc. I would be more than happy to help you with any of these steps in my free time.

        Show
        Prasanth J added a comment - zhchbin Thanks for your interest. The cube work in Pig had many parts to it, PIG-2167 adds the syntax for CUBE and ROLLUP operation. It also rewrites the logical plan to insert the cubing and rollup udfs. PIG-2167 is just a syntactic sugar over the UDFs. The main idea of the MRCube algorithm is to handle holistic measures ( PIG-2831 ). PIG-2831 patch is way too outdated and is not reviewed yet. So there might be many rough edges. PIG-2831 does the following 1) Identify holistic measure and insert logical cube/rollup operators with some additional information 2) Identify algebraic attributes (partitioning attribute) 3) Inserts sampling job into MRPlan (added new sampling algorithm that reads N records randomly without reading the entire file) 4) Added new intermediate storage formats that can write/read statistics 5) Inserts post-processing job to aggregate the partitioned results All the above steps are required for value-partitioning algorithm for data distribution. It does not implement the batch-area identification (distributing the computation) as mentioned in the paper. It might be worthwhile to start off with value partitioning followed by batch areas and other optimizations like BUC etc. I would be more than happy to help you with any of these steps in my free time.

          People

          • Assignee:
            Prasanth J
            Reporter:
            Dmitriy V. Ryaboy
          • Votes:
            6 Vote for this issue
            Watchers:
            23 Start watching this issue

            Dates

            • Created:
              Updated:

              Development