Pig
  1. Pig
  2. PIG-2430

An EvalFunc which overrides getArgToFuncMapping with FuncSpec with constructor arguments is not properly instantiated with said arguments

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.9.0, 0.10.0, 0.11
    • Fix Version/s: 0.10.0, 0.11
    • Component/s: None
    • Labels:
      None

      Description

      If you override getArgToFuncMapping and any of the FuncSpec's specify constructor arguments, those arguments are currently ignored. Thankfully, there is a one line fix (it's funny that this has never been run into before, but is an). Patch with tests incoming. I assume that this affects 0.10 and 0.9 but haven't tested, I was just working with trunk.

      1. PIG2430.patch
        6 kB
        Jonathan Coveney
      2. PIG2430_2.patch
        9 kB
        Jonathan Coveney
      3. PIG2430_2.010.patch
        9 kB
        Thejas M Nair
      4. PIG2430_1.patch
        5 kB
        Jonathan Coveney
      5. DummySize.patch
        0.8 kB
        Thejas M Nair

        Issue Links

          Activity

          Hide
          Jonathan Coveney added a comment -

          I added a new test. Ran ant test-commit successful. Should test on 0.10 and 0.9

          Show
          Jonathan Coveney added a comment - I added a new test. Ran ant test-commit successful. Should test on 0.10 and 0.9
          Hide
          Jonathan Coveney added a comment -

          I patched .9 and .10 and ran ant test-commit successfully too

          Show
          Jonathan Coveney added a comment - I patched .9 and .10 and ran ant test-commit successfully too
          Hide
          Dmitriy V. Ryaboy added a comment -

          Nice fix, but make the test use local mode. Also might want to move the UDF classes into the test itself, it's not like someone is likely to reuse it from the utils package.

          Show
          Dmitriy V. Ryaboy added a comment - Nice fix, but make the test use local mode. Also might want to move the UDF classes into the test itself, it's not like someone is likely to reuse it from the utils package.
          Hide
          Jonathan Coveney added a comment -

          This takes into account Dmitriy's suggestions. I successfully ran ant test-commit on trunk, and ant -Dtestcase=TestUDF test on 0.9 and 0.10.

          Show
          Jonathan Coveney added a comment - This takes into account Dmitriy's suggestions. I successfully ran ant test-commit on trunk, and ant -Dtestcase=TestUDF test on 0.9 and 0.10.
          Hide
          Jonathan Coveney added a comment -

          I ran ant test-commit successfully on 0.9 and 0.10

          Show
          Jonathan Coveney added a comment - I ran ant test-commit successfully on 0.9 and 0.10
          Hide
          Thejas M Nair added a comment -
          grunt> define SZ SIZE('10');        
          grunt> l = load 't.txt' as (a,b);   
          grunt> f = foreach l generate SZ(a);
          grunt> dump f;
          
          

          I have attached a patch with a change to size udf to return a dummy size value passed in the constructor. (the change is just meant to demonstrate the issue)

          With the change proposed in PIG2430_1.patch, the constructor argument does not get set. Without the PIG2430_1.patch, the SZ(a) call will return 10, with it returns 0.

          Show
          Thejas M Nair added a comment - grunt> define SZ SIZE('10'); grunt> l = load 't.txt' as (a,b); grunt> f = foreach l generate SZ(a); grunt> dump f; I have attached a patch with a change to size udf to return a dummy size value passed in the constructor. (the change is just meant to demonstrate the issue) With the change proposed in PIG2430_1.patch, the constructor argument does not get set. Without the PIG2430_1.patch, the SZ(a) call will return 10, with it returns 0.
          Hide
          Jonathan Coveney added a comment -

          Thejas, thanks again for taking a look. I see what you mean now. I think there are a couple of resolutions to this, the question comes down to: in the case of an explicit define statement, but where there is also a matching getArgToFuncMapping, which should take precedence? For an explicit parameter (ie you dummy sIZE example), it seems pretty clear that the user defined parameter should take precedence. With no parameter it is less clear...there are 2 reasons why people would use a define statement.
          1) to explicitly define a constructor parameter (or intentionally 0 constructor parameters)
          2) to be able to refer to a UDF by an alias (in which case, the argument by the getArgToFuncMapping may be appropriate)

          This ambiguity could be why this argument was ignored in the first place, but given it's there, I think it could be useful in UDFs, we just have to be clear on the semantics.

          Show
          Jonathan Coveney added a comment - Thejas, thanks again for taking a look. I see what you mean now. I think there are a couple of resolutions to this, the question comes down to: in the case of an explicit define statement, but where there is also a matching getArgToFuncMapping, which should take precedence? For an explicit parameter (ie you dummy sIZE example), it seems pretty clear that the user defined parameter should take precedence. With no parameter it is less clear...there are 2 reasons why people would use a define statement. 1) to explicitly define a constructor parameter (or intentionally 0 constructor parameters) 2) to be able to refer to a UDF by an alias (in which case, the argument by the getArgToFuncMapping may be appropriate) This ambiguity could be why this argument was ignored in the first place, but given it's there, I think it could be useful in UDFs, we just have to be clear on the semantics.
          Hide
          Dmitriy V. Ryaboy added a comment -

          Is there a way for us to tell if a UDF was DEFINEd or just used directly? That way we could tell if no-arguments was forced through a define, or if we should obey the args passed via argToFuncMapping.

          Show
          Dmitriy V. Ryaboy added a comment - Is there a way for us to tell if a UDF was DEFINEd or just used directly? That way we could tell if no-arguments was forced through a define, or if we should obey the args passed via argToFuncMapping.
          Hide
          Thejas M Nair added a comment -

          Is there a way for us to tell if a UDF was DEFINEd or just used directly?

          The information is not currently available in TypeCheckingExpVisitor, but it should be possible to pass that along. LogicalPlanBuilder.buildUDF is where the replacement with defined udf happens.

          This ambiguity could be why this argument was ignored in the first place, but given it's there, I think it could be useful in UDFs, we just have to be clear on the semantics.

          I think we can define the semantics as the getArgsToFuncMapping() returns function with default constructor arguments, which can be overridden by define statement. I think it makes sense to add this info to the javadoc comments for getArgsToFuncMapping.

          We need to think about how this constructor argument semantics should be defined with the new EvalFunc interface in PIG-2421.

          Show
          Thejas M Nair added a comment - Is there a way for us to tell if a UDF was DEFINEd or just used directly? The information is not currently available in TypeCheckingExpVisitor, but it should be possible to pass that along. LogicalPlanBuilder.buildUDF is where the replacement with defined udf happens. This ambiguity could be why this argument was ignored in the first place, but given it's there, I think it could be useful in UDFs, we just have to be clear on the semantics. I think we can define the semantics as the getArgsToFuncMapping() returns function with default constructor arguments, which can be overridden by define statement. I think it makes sense to add this info to the javadoc comments for getArgsToFuncMapping. We need to think about how this constructor argument semantics should be defined with the new EvalFunc interface in PIG-2421 .
          Hide
          Jonathan Coveney added a comment -

          I agree that this experience should definitely inform the new EvalFunc design. I added a new comment

          As far as this patch, I added your case to TestUDF, and took it into account. There is a part of LogicalPlanBuilder.java's buildUDF that is only called if there was no define statement for the alias in question. Thus, I added a field to UserFuncExpression which holds true if it was created via a define, and false otherwise.

          Thus, if you were to do...

          define my myudf('val');
          a = load ...
          b = foreach a generate my($0), myudf($0);
          

          In this case, the my would have any FuncSpec values overriden by the define statement, but myudf would not, since it was not via an alias.

          Let me know if this seems like an appropriate way to do this!

          Show
          Jonathan Coveney added a comment - I agree that this experience should definitely inform the new EvalFunc design. I added a new comment As far as this patch, I added your case to TestUDF, and took it into account. There is a part of LogicalPlanBuilder.java's buildUDF that is only called if there was no define statement for the alias in question. Thus, I added a field to UserFuncExpression which holds true if it was created via a define, and false otherwise. Thus, if you were to do... define my myudf('val'); a = load ... b = foreach a generate my($0), myudf($0); In this case, the my would have any FuncSpec values overriden by the define statement, but myudf would not, since it was not via an alias. Let me know if this seems like an appropriate way to do this!
          Hide
          Jonathan Coveney added a comment -

          Thejas,

          Was wondering if you might be able to get this a look to see if it seems reasonable.

          Show
          Jonathan Coveney added a comment - Thejas, Was wondering if you might be able to get this a look to see if it seems reasonable.
          Hide
          Thejas M Nair added a comment -

          Sorry, I had forgot about this jira. Thanks for reminding!
          This looks good. I will commit after running unit tests and test-patch.

          Show
          Thejas M Nair added a comment - Sorry, I had forgot about this jira. Thanks for reminding! This looks good. I will commit after running unit tests and test-patch.
          Hide
          Thejas M Nair added a comment -

          Does anybody feel strongly to have this in 0.9 branch? While I don't see the change as being risky, I think it is better to minimize the patches to 0.9 in the interest of keeping it stable.

          Show
          Thejas M Nair added a comment - Does anybody feel strongly to have this in 0.9 branch? While I don't see the change as being risky, I think it is better to minimize the patches to 0.9 in the interest of keeping it stable.
          Hide
          Dmitriy V. Ryaboy added a comment -

          I'm ok with keeping this out of 0.9 branch.

          Show
          Dmitriy V. Ryaboy added a comment - I'm ok with keeping this out of 0.9 branch.
          Hide
          Jonathan Coveney added a comment -

          I'm fine keeping this out of 0.9. The genesis of this patch was that it enables other things I'd like to do in trunk. 0.9 should probably really only have bug fixes and minor improvements at this point, keeping it stable, and we should make 0.10 the next bastion of the high tech pigs.

          Show
          Jonathan Coveney added a comment - I'm fine keeping this out of 0.9. The genesis of this patch was that it enables other things I'd like to do in trunk. 0.9 should probably really only have bug fixes and minor improvements at this point, keeping it stable, and we should make 0.10 the next bastion of the high tech pigs.
          Hide
          Thejas M Nair added a comment -

          PIG2430_2.010.patch - patch re-based with 0.10 branch .

          Show
          Thejas M Nair added a comment - PIG2430_2.010.patch - patch re-based with 0.10 branch .
          Hide
          Thejas M Nair added a comment -

          test-patch and unit tests pass. Committed the patches to 0.10 branch and trunk.
          Thanks Jonathan!

          Show
          Thejas M Nair added a comment - test-patch and unit tests pass. Committed the patches to 0.10 branch and trunk. Thanks Jonathan!

            People

            • Assignee:
              Jonathan Coveney
              Reporter:
              Jonathan Coveney
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development