Pig
  1. Pig
  2. PIG-1926

Sample/Limit should take scalar

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.10.0
    • Component/s: None
    • Labels:
    • Release Note:
      Hide
      Limit and Sample now accept a variable (scalar) as argument.

      For example, the new Limit command allows the following syntax to get the top 1% of a sorted file:
      [ a = LOAD 'a.txt'; b = GROUP a all; c = FOREACH b GENERATE COUNT(a) AS sum; d = ORDER a BY $0; e = LIMIT d c.sum/100; ]

      Only scalar variables may be used in the expression in limit or sample, columns in the input relation for the operation cannot be used in the expression. A statement like [ e = LIMIT d $0; ] is invalid.
      The new Sample command allows for the same syntax.

      Using a variable instead of a constant in Limit automatically disables most of the optimizations (only push-before-foreach is performed). More work is needed to enable optimizations for limit-after-sort, limit duplication before cross/union and limit merging.
      Show
      Limit and Sample now accept a variable (scalar) as argument. For example, the new Limit command allows the following syntax to get the top 1% of a sorted file: [ a = LOAD 'a.txt'; b = GROUP a all; c = FOREACH b GENERATE COUNT(a) AS sum; d = ORDER a BY $0; e = LIMIT d c.sum/100; ] Only scalar variables may be used in the expression in limit or sample, columns in the input relation for the operation cannot be used in the expression. A statement like [ e = LIMIT d $0; ] is invalid. The new Sample command allows for the same syntax. Using a variable instead of a constant in Limit automatically disables most of the optimizations (only push-before-foreach is performed). More work is needed to enable optimizations for limit-after-sort, limit duplication before cross/union and limit merging.

      Description

      Currently, Limit, Sample only takes a constant. It would be better we can use a scalar in the place of constant. Eg:

      a = load 'a.txt';
      b = group a all;
      c = foreach b generate COUNT(a) as sum;
      d = order a by $0;
      e = limit d c.sum/100;
      

      This is a candidate project for Google summer of code 2011. More information about the program can be found at http://wiki.apache.org/pig/GSoc2011

      1. PIG-1926.10.patch
        45 kB
        Gianmarco De Francisci Morales
      2. PIG-1926.11.patch
        51 kB
        Gianmarco De Francisci Morales
      3. PIG-1926.12.1.patch
        56 kB
        Thejas M Nair
      4. PIG-1926.12.patch
        55 kB
        Gianmarco De Francisci Morales
      5. PIG-1926.7.patch
        34 kB
        Gianmarco De Francisci Morales
      6. PIG-1926.8.patch
        41 kB
        Gianmarco De Francisci Morales
      7. PIG-1926.9.patch
        47 kB
        Gianmarco De Francisci Morales
      8. PIG-1926.patch
        26 kB
        Gianmarco De Francisci Morales
      9. PIG-1926.patch
        26 kB
        Gianmarco De Francisci Morales
      10. PIG-1926.patch
        86 kB
        Gianmarco De Francisci Morales
      11. PIG-1926.patch
        76 kB
        Gianmarco De Francisci Morales
      12. PIG-1926.patch
        74 kB
        Gianmarco De Francisci Morales
      13. PIG-1926.patch
        28 kB
        Gianmarco De Francisci Morales

        Issue Links

          Activity

          Hide
          Gianmarco De Francisci Morales added a comment -

          This looks quite useful.
          If Ciemiewicz is already working on Reservoir sampling, I think I will apply for this one.
          I assume there will probably be some integration point between the two projects.

          Show
          Gianmarco De Francisci Morales added a comment - This looks quite useful. If Ciemiewicz is already working on Reservoir sampling, I think I will apply for this one. I assume there will probably be some integration point between the two projects.
          Hide
          Gianmarco De Francisci Morales added a comment -

          Hi, I have drafted my proposal for this project.
          I assumed Daniel would be the mentor.

          http://socghop.appspot.com/gsoc/proposal/review/google/gsoc2011/azaroth/1

          Any comment is highly appreciated.

          Show
          Gianmarco De Francisci Morales added a comment - Hi, I have drafted my proposal for this project. I assumed Daniel would be the mentor. http://socghop.appspot.com/gsoc/proposal/review/google/gsoc2011/azaroth/1 Any comment is highly appreciated.
          Hide
          Gianmarco De Francisci Morales added a comment -

          First implementation of the feature.
          Focusing on LIMIT for now.

          • Modified the grammar to also allow an expression in LIMIT.
            Retained old behavior and code-path for constant expressions.
            The rule is split into 2 fragments to solve problems with ANTLR's syntactic predicates and local variables. (see The Definitive ANTLR Reference, 14.7 Issues with Actions and Syntactic Predicates, p.353)
          • Modified LOLimit to keep also an expression plan.
            Constant expressions have priority over expression plans. (even though both should never be set at the same time).
          • Modified LogicalPlanBuilder to allow creation of LOLimit in 2 steps.

          The expression evaluation does not work yet because I need to modify POLimit accordingly, but it compiles, the changes do not disrupt the old behaviour and no runtime exception is thrown when using the new one.

          Next steps:
          Modify POLimit to evaluate the expression.
          Evaluate which changes are required in LogToPhyTranslationVisitor to correctly create POLimit.
          Evaluate which changes are required in TypeCheckingRelVisitor to ensure type safety.
          Evaluate which changes are required in MRCompiler to retain correct POLimit compilation
          (LimitOptimizer breaks the current modifications).

          There are some whitespace/format changes in the current patch due to automatic formatting.
          Should I try to remove them?

          Show
          Gianmarco De Francisci Morales added a comment - First implementation of the feature. Focusing on LIMIT for now. Modified the grammar to also allow an expression in LIMIT. Retained old behavior and code-path for constant expressions. The rule is split into 2 fragments to solve problems with ANTLR's syntactic predicates and local variables. (see The Definitive ANTLR Reference, 14.7 Issues with Actions and Syntactic Predicates, p.353) Modified LOLimit to keep also an expression plan. Constant expressions have priority over expression plans. (even though both should never be set at the same time). Modified LogicalPlanBuilder to allow creation of LOLimit in 2 steps. The expression evaluation does not work yet because I need to modify POLimit accordingly, but it compiles, the changes do not disrupt the old behaviour and no runtime exception is thrown when using the new one. Next steps: Modify POLimit to evaluate the expression. Evaluate which changes are required in LogToPhyTranslationVisitor to correctly create POLimit. Evaluate which changes are required in TypeCheckingRelVisitor to ensure type safety. Evaluate which changes are required in MRCompiler to retain correct POLimit compilation (LimitOptimizer breaks the current modifications). There are some whitespace/format changes in the current patch due to automatic formatting. Should I try to remove them?
          Hide
          Gianmarco De Francisci Morales added a comment -

          First working version for LIMIT.
          POLimit correctly evaluates the limit at runtime.
          AllExpressionVisitor visits LOLimit, discovers the scalar expression and correctly sets the plan.
          Visitors do the visit only if the plan has been set, so the normal code path is not disrupted.

          Is there a consistent formatting template for Pig in order to make the patch more readable? (half of the modified lines are from the code formatter).

          Next steps:
          Evaluate the interaction with optimizers.
          Translate the script used for testing into a unit test for the feature.

          Show
          Gianmarco De Francisci Morales added a comment - First working version for LIMIT. POLimit correctly evaluates the limit at runtime. AllExpressionVisitor visits LOLimit, discovers the scalar expression and correctly sets the plan. Visitors do the visit only if the plan has been set, so the normal code path is not disrupted. Is there a consistent formatting template for Pig in order to make the patch more readable? (half of the modified lines are from the code formatter). Next steps: Evaluate the interaction with optimizers. Translate the script used for testing into a unit test for the feature.
          Hide
          Gianmarco De Francisci Morales added a comment -

          Modified TypeCheckingRelVisitor in order to correctly cast expression in LOLimit.
          Now the visitor takes into account that LOLimit has an expression.
          The return type of the expression must be Integer or Long.

          Show
          Gianmarco De Francisci Morales added a comment - Modified TypeCheckingRelVisitor in order to correctly cast expression in LOLimit. Now the visitor takes into account that LOLimit has an expression. The return type of the expression must be Integer or Long.
          Hide
          Gianmarco De Francisci Morales added a comment -

          Addressed TestLogToPhyCompiler failure.
          TestLogToPhyCompiler.testLimit():
          Modified grammar file LogicalPlanGenerator.g to preserve previous AST structure.
          The previous modification disrupted tests that directly interacted with ASTs.
          The new grammar just adds expr() as a side option to the original limit_clause() rule.

          Addressed TestGrunt failure.
          Modified TypeCheckingRelVisitor in order to correctly cast expression in LOLimit.
          Visit the expression plan only if present.

          Disabled optimizer for variable LIMIT case.

          Added simple test for variable LIMIT.

          Attempt to fix things in MR layer:
          Added a limitPlan to MapReduceOper (not sure it's useful).
          Adapted the check on the existence of a limit in MROper
          Copy also the plan when creating a new POLimit in MRCompiler.

          TODO:
          Test whether it works in non-local mode (how should I do that without a cluster?)
          Evaluate what else to change in MR layer.

          Show
          Gianmarco De Francisci Morales added a comment - Addressed TestLogToPhyCompiler failure. TestLogToPhyCompiler.testLimit(): Modified grammar file LogicalPlanGenerator.g to preserve previous AST structure. The previous modification disrupted tests that directly interacted with ASTs. The new grammar just adds expr() as a side option to the original limit_clause() rule. Addressed TestGrunt failure. Modified TypeCheckingRelVisitor in order to correctly cast expression in LOLimit. Visit the expression plan only if present. Disabled optimizer for variable LIMIT case. Added simple test for variable LIMIT. Attempt to fix things in MR layer: Added a limitPlan to MapReduceOper (not sure it's useful). Adapted the check on the existence of a limit in MROper Copy also the plan when creating a new POLimit in MRCompiler. TODO: Test whether it works in non-local mode (how should I do that without a cluster?) Evaluate what else to change in MR layer.
          Hide
          Thejas M Nair added a comment -

          Is there a consistent formatting template for Pig in order to make the patch more readable? (half of the modified lines are from the code formatter).

          Officially, pig is supposed to follow the Oracle/Sun coding conventions - http://wiki.apache.org/pig/HowToContribute. But in that large list, only things like use of 4 tabs for indentation and limiting length of a line to around 80 chars is what is almost always followed.
          Can you please try to disable auto-formatting and format only the sections of code that you are changing (cmd-i in eclipse on mac)? Otherwise, it is difficult to distinguish between a real change and formatting change.

          Show
          Thejas M Nair added a comment - Is there a consistent formatting template for Pig in order to make the patch more readable? (half of the modified lines are from the code formatter). Officially, pig is supposed to follow the Oracle/Sun coding conventions - http://wiki.apache.org/pig/HowToContribute . But in that large list, only things like use of 4 tabs for indentation and limiting length of a line to around 80 chars is what is almost always followed. Can you please try to disable auto-formatting and format only the sections of code that you are changing (cmd-i in eclipse on mac)? Otherwise, it is difficult to distinguish between a real change and formatting change.
          Hide
          Gianmarco De Francisci Morales added a comment -

          I tried to clean up the patch in order to make it more readable.
          Please let me know if it is OK.

          Show
          Gianmarco De Francisci Morales added a comment - I tried to clean up the patch in order to make it more readable. Please let me know if it is OK.
          Hide
          Thejas M Nair added a comment -

          The new patch without the formatting changes is good, it is more readable. I am reviewing it.

          Test whether it works in non-local mode (how should I do that without a cluster?)

          MiniCluster is used to test non-local mode. You can look at the test in - http://svn.apache.org/viewvc/pig/trunk/test/org/apache/pig/test/TestLimitAdjuster.java

          Show
          Thejas M Nair added a comment - The new patch without the formatting changes is good, it is more readable. I am reviewing it. Test whether it works in non-local mode (how should I do that without a cluster?) MiniCluster is used to test non-local mode. You can look at the test in - http://svn.apache.org/viewvc/pig/trunk/test/org/apache/pig/test/TestLimitAdjuster.java
          Hide
          Gianmarco De Francisci Morales added a comment -

          Ported TestLimitVariable to use MiniCluster.
          The test works also in MR mode.

          Show
          Gianmarco De Francisci Morales added a comment - Ported TestLimitVariable to use MiniCluster. The test works also in MR mode.
          Hide
          Thejas M Nair added a comment -

          Comments on the most recent patch -

          • The expression in limit should be allowed to refer to columns only in scalar context, ie pig should give a proper error message when such a statement is created. A Right now it allows the statement "lim = limit l $0;", and that results in NPE . You can create a validation Visitor to check this, that gets called from PigServer.Graph.compile(lp).
          • If the expression in limit evaluates to bytearray, it can be implicitly cast to a long. This can be done in the typechecker.
          • The test files are missing. Maybe you forgot to include test dir in diff?

          I still need to review the MRCompiler changes.

          Can you use a different file name for each patch (eg PIG-1926.5.patch), and also refer to that file name in your comments? That way it is easier to identify the downloaded patch files and associate them with the comments.

          Show
          Thejas M Nair added a comment - Comments on the most recent patch - The expression in limit should be allowed to refer to columns only in scalar context, ie pig should give a proper error message when such a statement is created. A Right now it allows the statement "lim = limit l $0;", and that results in NPE . You can create a validation Visitor to check this, that gets called from PigServer.Graph.compile(lp). If the expression in limit evaluates to bytearray, it can be implicitly cast to a long. This can be done in the typechecker. The test files are missing. Maybe you forgot to include test dir in diff? I still need to review the MRCompiler changes. Can you use a different file name for each patch (eg PIG-1926 .5.patch), and also refer to that file name in your comments? That way it is easier to identify the downloaded patch files and associate them with the comments.
          Hide
          Thejas M Nair added a comment -

          The following visitors also should visit the expression in limit - DotLOPrinter,LogicalPlanPrinter,SchemaResetter,UidResetter.
          MRCompiler changes look good, some additional changes would be required (for sort with limit) there once optimizations in LimitOptimizer are enabled.

          Show
          Thejas M Nair added a comment - The following visitors also should visit the expression in limit - DotLOPrinter,LogicalPlanPrinter,SchemaResetter,UidResetter. MRCompiler changes look good, some additional changes would be required (for sort with limit) there once optimizations in LimitOptimizer are enabled.
          Hide
          Gianmarco De Francisci Morales added a comment -

          The expression in limit should be allowed to refer to columns only in scalar context, ie pig should give a proper error message when such a statement is created. A Right now it allows the statement "lim = limit l $0;", and that results in NPE . You can create a validation Visitor to check this, that gets called from PigServer.Graph.compile(lp).

          Thanks for pointing this out. I hadn't thought about this.
          I suppose I should check that there is a scalar expression in the expression plan.
          But this would disallow using UDFs, wouldn't it?
          Is there another way to check the context of an expression?

          If the expression in limit evaluates to bytearray, it can be implicitly cast to a long. This can be done in the typechecker.

          Isn't this dangerous? We would arbitrarily cast the first 8 bytes of the array to long.
          What if the byte array is larger/smaller than this?
          What would be the use case for such a feature?

          The test files are missing. Maybe you forgot to include test dir in diff?

          Yes I forgot to add the file to svn. I re.added it in PIG-1926.7.patch.

          The following visitors also should visit the expression in limit - DotLOPrinter,LogicalPlanPrinter,SchemaResetter,UidResetter.

          Addedd in PIG-1926.7.patch

          MRCompiler changes look good, some additional changes would be required (for sort with limit) there once optimizations in LimitOptimizer are enabled.

          Sort with limit is used only in MRCompiler, right?

          Show
          Gianmarco De Francisci Morales added a comment - The expression in limit should be allowed to refer to columns only in scalar context, ie pig should give a proper error message when such a statement is created. A Right now it allows the statement "lim = limit l $0;", and that results in NPE . You can create a validation Visitor to check this, that gets called from PigServer.Graph.compile(lp). Thanks for pointing this out. I hadn't thought about this. I suppose I should check that there is a scalar expression in the expression plan. But this would disallow using UDFs, wouldn't it? Is there another way to check the context of an expression? If the expression in limit evaluates to bytearray, it can be implicitly cast to a long. This can be done in the typechecker. Isn't this dangerous? We would arbitrarily cast the first 8 bytes of the array to long. What if the byte array is larger/smaller than this? What would be the use case for such a feature? The test files are missing. Maybe you forgot to include test dir in diff? Yes I forgot to add the file to svn. I re.added it in PIG-1926 .7.patch. The following visitors also should visit the expression in limit - DotLOPrinter,LogicalPlanPrinter,SchemaResetter,UidResetter. Addedd in PIG-1926 .7.patch MRCompiler changes look good, some additional changes would be required (for sort with limit) there once optimizations in LimitOptimizer are enabled. Sort with limit is used only in MRCompiler, right?
          Hide
          Thejas M Nair added a comment -

          I suppose I should check that there is a scalar expression in the expression plan.
          But this would disallow using UDFs, wouldn't it?
          Is there another way to check the context of an expression?

          The columns will map to a ProjectExpression. If the visitor finds a ProjectExpression, it can throw an error.

          Isn't this dangerous? We would arbitrarily cast the first 8 bytes of the array to long.
          What if the byte array is larger/smaller than this?
          What would be the use case for such a feature?

          In pig the types are optional, and the default type is bytearray. bytearray is cast to other types based on the context in Typechecker. For example, see - http://pig.apache.org/docs/r0.8.0/piglatin_ref2.html#Arithmetic+Operators
          If the bytearray cannot be cast to long, you will get an error in Result.returnStatus .

          for example -

          a = load 'x' as (id, num);
          fil = filter a by id == '123'; -- assuming this returns one row
          b = load 'y';
          lim = limit b fil.num ; -- fil.num should be cast to long
          

          In POLimit.getNext(Tuple t), while computing the value of the limit expression, the value of Result.returnStatus needs to be checked. If it is not STATUS_OK, it should give an error. The assert will give an error only if the assertions are explicitly enabled at runtime, so the assert should be replaced by a code that always throws an exception if the condition is not satisfied.

          Sort with limit is used only in MRCompiler, right?

          Are you asking if the changes for use of limit expression in sort are required only in MRCompiler ? Yes, thats the only place where change should be needed, apart from LimitOptimizer.

          Show
          Thejas M Nair added a comment - I suppose I should check that there is a scalar expression in the expression plan. But this would disallow using UDFs, wouldn't it? Is there another way to check the context of an expression? The columns will map to a ProjectExpression. If the visitor finds a ProjectExpression, it can throw an error. Isn't this dangerous? We would arbitrarily cast the first 8 bytes of the array to long. What if the byte array is larger/smaller than this? What would be the use case for such a feature? In pig the types are optional, and the default type is bytearray. bytearray is cast to other types based on the context in Typechecker. For example, see - http://pig.apache.org/docs/r0.8.0/piglatin_ref2.html#Arithmetic+Operators If the bytearray cannot be cast to long, you will get an error in Result.returnStatus . for example - a = load 'x' as (id, num); fil = filter a by id == '123'; -- assuming this returns one row b = load 'y'; lim = limit b fil.num ; -- fil.num should be cast to long In POLimit.getNext(Tuple t), while computing the value of the limit expression, the value of Result.returnStatus needs to be checked. If it is not STATUS_OK, it should give an error. The assert will give an error only if the assertions are explicitly enabled at runtime, so the assert should be replaced by a code that always throws an exception if the condition is not satisfied. Sort with limit is used only in MRCompiler, right? Are you asking if the changes for use of limit expression in sort are required only in MRCompiler ? Yes, thats the only place where change should be needed, apart from LimitOptimizer.
          Hide
          Gianmarco De Francisci Morales added a comment -

          The expression in limit should be allowed to refer to columns only in scalar context, ie pig should give a proper error message when such a statement is created. A Right now it allows the statement "lim = limit l $0;", and that results in NPE . You can create a validation Visitor to check this, that gets called from PigServer.Graph.compile(lp).

          Addressed in PIG-1926.8.patch

          Added LimitVariableValidator visitor to validate variable limit expression.
          – If a ProjectExpression is found (intead of a ScalarExpression) it throws a FrontendException
          – Modified PigServer to call the visitor
          – Added a test for this case in TestLimitVariable

          If the expression in limit evaluates to bytearray, it can be implicitly cast to a long. This can be done in the typechecker.

          Addressed in PIG-1926.8.patch

          – Added implicit cast from bytearray to long in TypeCheckingRelVisitor
          – Added test for it in TestLimitVariable

          In POLimit.getNext(Tuple t), while computing the value of the limit expression, the value of Result.returnStatus needs to be checked. If it is not STATUS_OK, it should give an error. The assert will give an error only if the assertions are explicitly enabled at runtime, so the assert should be replaced by a code that always throws an exception if the condition is not satisfied.

          Addressed in PIG-1926.8.patch

          – Added runtime check of expression result in POLimit

          TODO:
          Exception refinement. Right now I throw mostly RuntimeExceptions.

          Discuss which modifications to MRCompiler are still needed.

          In my opinion optimizations at this stage are not mandatory, we can postpone them for later. As stated in the project, the important thing is that optimizers continue working for the constant case and do not break for the variable case.

          If you agree I would start working on SAMPLE.

          Show
          Gianmarco De Francisci Morales added a comment - The expression in limit should be allowed to refer to columns only in scalar context, ie pig should give a proper error message when such a statement is created. A Right now it allows the statement "lim = limit l $0;", and that results in NPE . You can create a validation Visitor to check this, that gets called from PigServer.Graph.compile(lp). Addressed in PIG-1926 .8.patch Added LimitVariableValidator visitor to validate variable limit expression. – If a ProjectExpression is found (intead of a ScalarExpression) it throws a FrontendException – Modified PigServer to call the visitor – Added a test for this case in TestLimitVariable If the expression in limit evaluates to bytearray, it can be implicitly cast to a long. This can be done in the typechecker. Addressed in PIG-1926 .8.patch – Added implicit cast from bytearray to long in TypeCheckingRelVisitor – Added test for it in TestLimitVariable In POLimit.getNext(Tuple t), while computing the value of the limit expression, the value of Result.returnStatus needs to be checked. If it is not STATUS_OK, it should give an error. The assert will give an error only if the assertions are explicitly enabled at runtime, so the assert should be replaced by a code that always throws an exception if the condition is not satisfied. Addressed in PIG-1926 .8.patch – Added runtime check of expression result in POLimit TODO: Exception refinement. Right now I throw mostly RuntimeExceptions. Discuss which modifications to MRCompiler are still needed. In my opinion optimizations at this stage are not mandatory, we can postpone them for later. As stated in the project, the important thing is that optimizers continue working for the constant case and do not break for the variable case. If you agree I would start working on SAMPLE.
          Hide
          Thejas M Nair added a comment -

          I agree that limit optimizations can be supported for this new feature at a later time.

          PIG-1926.8.patch looks good. In testLimitVariableException1, can you also check for the error message in exception using Util.checkMessageInException ?
          Can you also add two additional test cases ? -

          • test that checks for the error message when the limit expression evaluates to wrong expression type
          • test with limit in a nested foreach statement
          Show
          Thejas M Nair added a comment - I agree that limit optimizations can be supported for this new feature at a later time. PIG-1926 .8.patch looks good. In testLimitVariableException1, can you also check for the error message in exception using Util.checkMessageInException ? Can you also add two additional test cases ? - test that checks for the error message when the limit expression evaluates to wrong expression type test with limit in a nested foreach statement
          Hide
          Gianmarco De Francisci Morales added a comment -

          Added support and test for nested limit variable in PIG-1926.9.patch.

          Show
          Gianmarco De Francisci Morales added a comment - Added support and test for nested limit variable in PIG-1926 .9.patch.
          Hide
          Gianmarco De Francisci Morales added a comment -

          Attaching PIG-1926.10.patch:
          Implemented support and tests for Sample.
          Fixed a problem with syntactic predicates in Limit, now it does not break on inline_op.

          All tests pass on my machine.
          We can call this a release candidate.
          I think it is ready for review.

          Show
          Gianmarco De Francisci Morales added a comment - Attaching PIG-1926 .10.patch: Implemented support and tests for Sample. Fixed a problem with syntactic predicates in Limit, now it does not break on inline_op. All tests pass on my machine. We can call this a release candidate. I think it is ready for review.
          Hide
          Thejas M Nair added a comment -

          There is a conflict in LogicalPlanBuilder.java, when the patch is applied to latest trunk. LimitVariableValidator.java is missing from the patch. Can you please update your svn tree and regenerate the patch ?

          Show
          Thejas M Nair added a comment - There is a conflict in LogicalPlanBuilder.java, when the patch is applied to latest trunk. LimitVariableValidator.java is missing from the patch. Can you please update your svn tree and regenerate the patch ?
          Hide
          Gianmarco De Francisci Morales added a comment -

          Attaching PIG-1926.11.patch

          Rebased my work on latest trunk to solve the conflict.
          Refined exception handling for LimitVariableValidator.
          Added a test for exception thrown by TypeCheckingRelVisitor in TestLimitVariable (which I had forgot).

          Show
          Gianmarco De Francisci Morales added a comment - Attaching PIG-1926 .11.patch Rebased my work on latest trunk to solve the conflict. Refined exception handling for LimitVariableValidator. Added a test for exception thrown by TypeCheckingRelVisitor in TestLimitVariable (which I had forgot).
          Hide
          Thejas M Nair added a comment -

          PIG-1926.11.patch - Sample also needs a check similar to that for Limit in LimitVariableValidator, to give error for statements such as - "f = sample l $0;"
          LimitVariableValidator could be replaced by new class that does the check for both limit and sample operations. Can you also add a negative test case for this? Everything else looks good.

          Show
          Thejas M Nair added a comment - PIG-1926 .11.patch - Sample also needs a check similar to that for Limit in LimitVariableValidator, to give error for statements such as - "f = sample l $0;" LimitVariableValidator could be replaced by new class that does the check for both limit and sample operations. Can you also add a negative test case for this? Everything else looks good.
          Hide
          Gianmarco De Francisci Morales added a comment -

          PIG-1926.12.patch addresses Thejas' comments:

          • Rebased my work on current trunk.
          • Renamed LimitVariableValidator to ScalarVariableValidator.
            The new validator checks for out-of-context references to scalar variables both in Limit and Sample. To make it work for Sample, I modified LOFilter to keep track of whether the operator is the result of a Sample command (otherwise I would disallow projections in Filter).
          • Added a test for the exception in TestSample
          Show
          Gianmarco De Francisci Morales added a comment - PIG-1926 .12.patch addresses Thejas' comments: Rebased my work on current trunk. Renamed LimitVariableValidator to ScalarVariableValidator. The new validator checks for out-of-context references to scalar variables both in Limit and Sample. To make it work for Sample, I modified LOFilter to keep track of whether the operator is the result of a Sample command (otherwise I would disallow projections in Filter). Added a test for the exception in TestSample
          Hide
          Thejas M Nair added a comment -

          +1 for PIG-1926.12.patch. Two new files were missing apache license header files, I have added that (PIG-1926.12.1.patch) and committed it to trunk.

          Gianmarco, thanks for the contribution!

          Show
          Thejas M Nair added a comment - +1 for PIG-1926 .12.patch. Two new files were missing apache license header files, I have added that ( PIG-1926 .12.1.patch) and committed it to trunk. Gianmarco, thanks for the contribution!
          Hide
          Gianmarco De Francisci Morales added a comment -

          Thank you!
          It was a pleasure.

          Show
          Gianmarco De Francisci Morales added a comment - Thank you! It was a pleasure.

            People

            • Assignee:
              Gianmarco De Francisci Morales
              Reporter:
              Daniel Dai
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development