Pig
  1. Pig
  2. PIG-3967

Grunt fail if we running more statement after first store

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.13.0
    • Component/s: impl
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      We will hit output validation fail. The issue is caused by PIG-3545. We change PigServer.Graph.validateQuery() to invoke LogicalPlan.validate(), which will do the output validation. In Grunt mode, even after the first store, we will compile the entire statement cache, so the first store will be in the logical plan, validate on output fail.

      This makes datagenerator of Pigmix fail.

      1. PIG-3967-1.patch
        2 kB
        Daniel Dai
      2. PIG-3967-2.patch
        4 kB
        Daniel Dai
      3. PIG-3967-suggestion.patch
        6 kB
        Cheolsoo Park

        Issue Links

          Activity

          Hide
          Daniel Dai added a comment -

          The suggested patch looks good to me. Thanks Cheolsoo!

          Patch committed to trunk and 0.13 branch.

          Show
          Daniel Dai added a comment - The suggested patch looks good to me. Thanks Cheolsoo! Patch committed to trunk and 0.13 branch.
          Hide
          Cheolsoo Park added a comment -

          Here is the patch that includes my both suggestions. Feel free to reject it.

          Verified that it passes the test case, and Grunt doesn't fail running more than one stores.

          Show
          Cheolsoo Park added a comment - Here is the patch that includes my both suggestions. Feel free to reject it. Verified that it passes the test case, and Grunt doesn't fail running more than one stores.
          Hide
          Cheolsoo Park added a comment - - edited

          Daniel Dai, thank you for the explanation. If that's the case, let's pass a boolean param to LogicalPlan.validate(). So instead we can do the following-

          diff --git a/src/org/apache/pig/PigServer.java b/src/org/apache/pig/PigServer.java
          index 0b2285e..d7ec664 100644
          --- a/src/org/apache/pig/PigServer.java
          +++ b/src/org/apache/pig/PigServer.java
          @@ -1677,12 +1677,12 @@ public class PigServer {
                       }
                   }
          
          -        void validateQuery() throws FrontendException {
          +        private void validateQuery() throws FrontendException {
                       String query = buildQuery();
                       QueryParserDriver parserDriver = new QueryParserDriver( pigContext, scope, fileNameMap );
                       try {
                           LogicalPlan plan = parserDriver.parse( query );
          -                plan.validate(pigContext, scope);
          +                plan.validate(pigContext, scope, true);
                       } catch(FrontendException ex) {
                           scriptCache.remove( scriptCache.size() -1 );
                           throw ex;
          @@ -1741,7 +1741,7 @@ public class PigServer {
                   }
          
                   private void compile() throws IOException {
          -            lp.validate(pigContext, scope);
          +            lp.validate(pigContext, scope, false);
                       currDAG.postProcess();
                   }
          
          diff --git a/src/org/apache/pig/newplan/logical/relational/LogicalPlan.java b/src/org/apache/pig/newplan/logical/relational/LogicalPlan.java
          index dbbd5d1..b753121 100644
          --- a/src/org/apache/pig/newplan/logical/relational/LogicalPlan.java
          +++ b/src/org/apache/pig/newplan/logical/relational/LogicalPlan.java
          @@ -165,8 +165,9 @@ public class LogicalPlan extends BaseOperatorPlan {
          
                   return Integer.toString(hos.getHashCode().asInt());
               }
          -
          -    public void validate(PigContext pigContext, String scope) throws FrontendException {
          +
          +    public void validate(PigContext pigContext, String scope, boolean skipInputOutputValidation)
          +            throws FrontendException {
          
                   new DanglingNestedNodeRemover(this).visit();
                   new ColumnAliasConversionVisitor(this).visit();
          @@ -204,7 +205,7 @@ public class LogicalPlan extends BaseOperatorPlan {
                   // compute whether output data is sorted or not
                   new SortInfoSetter(this).visit();
          
          -        if (!(pigContext.inExplain || pigContext.inDumpSchema)) {
          +        if (!(skipInputOutputValidation || pigContext.inExplain || pigContext.inDumpSchema)) {
                       // Validate input/output file
                       new InputOutputFileValidatorVisitor(this, pigContext).visit();
                   }
          

          My main concern is that if you set a boolean flag to true and false before and after a method call, it will make someone wonder about the logic. It certainly did to me.

          I can upload a patch that includes my suggestions. Let me know.

          Show
          Cheolsoo Park added a comment - - edited Daniel Dai , thank you for the explanation. If that's the case, let's pass a boolean param to LogicalPlan.validate() . So instead we can do the following- diff --git a/src/org/apache/pig/PigServer.java b/src/org/apache/pig/PigServer.java index 0b2285e..d7ec664 100644 --- a/src/org/apache/pig/PigServer.java +++ b/src/org/apache/pig/PigServer.java @@ -1677,12 +1677,12 @@ public class PigServer { } } - void validateQuery() throws FrontendException { + private void validateQuery() throws FrontendException { String query = buildQuery(); QueryParserDriver parserDriver = new QueryParserDriver( pigContext, scope, fileNameMap ); try { LogicalPlan plan = parserDriver.parse( query ); - plan.validate(pigContext, scope); + plan.validate(pigContext, scope, true ); } catch (FrontendException ex) { scriptCache.remove( scriptCache.size() -1 ); throw ex; @@ -1741,7 +1741,7 @@ public class PigServer { } private void compile() throws IOException { - lp.validate(pigContext, scope); + lp.validate(pigContext, scope, false ); currDAG.postProcess(); } diff --git a/src/org/apache/pig/newplan/logical/relational/LogicalPlan.java b/src/org/apache/pig/newplan/logical/relational/LogicalPlan.java index dbbd5d1..b753121 100644 --- a/src/org/apache/pig/newplan/logical/relational/LogicalPlan.java +++ b/src/org/apache/pig/newplan/logical/relational/LogicalPlan.java @@ -165,8 +165,9 @@ public class LogicalPlan extends BaseOperatorPlan { return Integer .toString(hos.getHashCode().asInt()); } - - public void validate(PigContext pigContext, String scope) throws FrontendException { + + public void validate(PigContext pigContext, String scope, boolean skipInputOutputValidation) + throws FrontendException { new DanglingNestedNodeRemover( this ).visit(); new ColumnAliasConversionVisitor( this ).visit(); @@ -204,7 +205,7 @@ public class LogicalPlan extends BaseOperatorPlan { // compute whether output data is sorted or not new SortInfoSetter( this ).visit(); - if (!(pigContext.inExplain || pigContext.inDumpSchema)) { + if (!(skipInputOutputValidation || pigContext.inExplain || pigContext.inDumpSchema)) { // Validate input/output file new InputOutputFileValidatorVisitor( this , pigContext).visit(); } My main concern is that if you set a boolean flag to true and false before and after a method call, it will make someone wonder about the logic. It certainly did to me. I can upload a patch that includes my suggestions. Let me know.
          Hide
          Daniel Dai added a comment -

          Comment#1: the problem is even in Grunt mode, there are two different cases: validateQuery happens when replaying every statement in cache, in which we need to disable output validation; When the real LogicalPlan is constructed, it removes previous LOStore (Graph.skipStores), and after this, the output validation should happen. A flag in Grunt does not distinguish these two cases.

          Comment#2: Sure, I will change the test to use tempDir.

          Show
          Daniel Dai added a comment - Comment#1: the problem is even in Grunt mode, there are two different cases: validateQuery happens when replaying every statement in cache, in which we need to disable output validation; When the real LogicalPlan is constructed, it removes previous LOStore (Graph.skipStores), and after this, the output validation should happen. A flag in Grunt does not distinguish these two cases. Comment#2: Sure, I will change the test to use tempDir.
          Hide
          Cheolsoo Park added a comment -

          Actually, can you change output to something like tempDir/testGruntValidation in the test case? The tempDir is deleted by TestPigServer.tearDown(), so this seems better than create/delete output in the pwd.

          When running the test more than one time, I found output doesn't get cleaned if the test case fails. In fact, we should always clean it up.

          Show
          Cheolsoo Park added a comment - Actually, can you change output to something like tempDir/testGruntValidation in the test case? The tempDir is deleted by TestPigServer.tearDown() , so this seems better than create/delete output in the pwd. When running the test more than one time, I found output doesn't get cleaned if the test case fails. In fact, we should always clean it up.
          Hide
          Cheolsoo Park added a comment -

          Oh nevermind about #2. I had output dir in my workspace...

          Show
          Cheolsoo Park added a comment - Oh nevermind about #2. I had output dir in my workspace...
          Hide
          Cheolsoo Park added a comment -
          1. Can we avoid setting a boolean flag before and after validateQuery()? Why don't we set a boolean flag inside the Grunt constructor instead? Perhaps name it as inGrunt since it might come handy for other purposes?
            +                pigContext.inGruntValidation = true;
                             validateQuery();
            +                pigContext.inGruntValidation = false;
            
          2. The unit test fails for me-
            org.apache.pig.impl.plan.VisitorException: ERROR 6000:
            <line 2, column 0> Output Location Validation Failed for: 'file:///Users/cheolsoop/workspace/pig-apache/output More info to follow:
            Output directory file:/Users/cheolsoop/workspace/pig-apache/output already exists
            
          Show
          Cheolsoo Park added a comment - Can we avoid setting a boolean flag before and after validateQuery()? Why don't we set a boolean flag inside the Grunt constructor instead? Perhaps name it as inGrunt since it might come handy for other purposes? + pigContext.inGruntValidation = true ; validateQuery(); + pigContext.inGruntValidation = false ; The unit test fails for me- org.apache.pig.impl.plan.VisitorException: ERROR 6000: <line 2, column 0> Output Location Validation Failed for : 'file: ///Users/cheolsoop/workspace/pig-apache/output More info to follow: Output directory file:/Users/cheolsoop/workspace/pig-apache/output already exists
          Hide
          Daniel Dai added a comment -

          Add test case.

          Show
          Daniel Dai added a comment - Add test case.

            People

            • Assignee:
              Daniel Dai
              Reporter:
              Daniel Dai
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development