Pig
  1. Pig
  2. PIG-1303

unable to set outgoing format for org.apache.pig.piggybank.evaluation.util.apachelogparser.DateExtractor

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.6.0
    • Fix Version/s: 0.7.0, 0.8.0
    • Component/s: None
    • Labels:
      None
    • Environment:

      pig 0.6.0 on a fedora linux machine, jdk 1.6 u11

      Description

      I'm unable to set the format of the outgoing date string in the constructor as it's supposed to work.
      The only way i could change the format was to change the default in the java class and rebuild piggybank.
      Apparently this has something to do with the way pig instantiates DateExtractor, quoting a replier on the mailing list:

      David Vrensk said:

      I ran into the same problem a couple of weeks ago, and
      played around with the code inserting some print/log statements. It turns
      out that the arguments are only used in the initial constructor calls, when
      the pig process is starting, but once pig reaches the point where it would
      use the udf, it creates new DateExtractors without passing the arguments.

      1. PIG-1303.patch
        9 kB
        Dmitriy V. Ryaboy
      2. ASF.LICENSE.NOT.GRANTED--TypeCheckingVisitor.java.diff
        0.6 kB
        Johannes Rußek

        Issue Links

          Activity

          Johannes Rußek created issue -
          Dmitriy V. Ryaboy made changes -
          Field Original Value New Value
          Assignee Dmitriy V. Ryaboy [ dvryaboy ]
          Hide
          Johannes Rußek added a comment -

          Hello everybody,
          this fixes it for us.
          Apparently due to the Typemagic the FuncSpec that is returned by our EvalFunc.getArgToFuncMapping() which doesn't contain the constructor arguments is used instead of the perfectly fine one in "func", which contains the constructor arguments.
          We've just copied the arguments from func to matchingSpec to make it work.
          Johannes

          Show
          Johannes Rußek added a comment - Hello everybody, this fixes it for us. Apparently due to the Typemagic the FuncSpec that is returned by our EvalFunc.getArgToFuncMapping() which doesn't contain the constructor arguments is used instead of the perfectly fine one in "func", which contains the constructor arguments. We've just copied the arguments from func to matchingSpec to make it work. Johannes
          Johannes Rußek made changes -
          Attachment TypeCheckingVisitor.java.diff [ 12441845 ]
          hc busy made changes -
          Link This issue blocks PIG-1386 [ PIG-1386 ]
          Hide
          hc busy added a comment -

          Okay, so, here's a thought:

          I'm kind of stuck writing the initial/intermed/Final methods for an algebraic EvalFunc that has constructor parameters because I couldn't pass the parameters in.

          A suggestion is to do this (without being incompatible with previous versions)

          Alter EvalFunc's profile so that

          public abstract class EvalFunc<T>  {
          
             protected handleChildConstructorParameters(Object... childConstructor){
                // by default do nothing.
             }
          
              public EvalFunc(Object... constructorParameters){
                  handleChildConstructorParameters(constructorParameters);
                  ... then do everything else it used to do.
              }
          }
          

          The reason why this is necessary is because I'll need to overrite handleChildConstructorParameters in my Algebraic EvalFunc to do some things before the rest of EvalFunc()'s constructor continues. This will help fix this date format problem for Algebraic evalfunc's.

          Show
          hc busy added a comment - Okay, so, here's a thought: I'm kind of stuck writing the initial/intermed/Final methods for an algebraic EvalFunc that has constructor parameters because I couldn't pass the parameters in. A suggestion is to do this (without being incompatible with previous versions) Alter EvalFunc's profile so that public abstract class EvalFunc<T> { protected handleChildConstructorParameters( Object ... childConstructor){ // by default do nothing. } public EvalFunc( Object ... constructorParameters){ handleChildConstructorParameters(constructorParameters); ... then do everything else it used to do . } } The reason why this is necessary is because I'll need to overrite handleChildConstructorParameters in my Algebraic EvalFunc to do some things before the rest of EvalFunc()'s constructor continues. This will help fix this date format problem for Algebraic evalfunc's.
          Hide
          Dmitriy V. Ryaboy added a comment -

          hc,
          I haven't gone through the code, but I think we should just try to do the same thing for initial, intermediate, and final functions, in terms of instantiation, as is already being done for the regular EvalFunc, and construct a better FuncSpec. I doubt we need the handleChildConstructorParams function.

          Show
          Dmitriy V. Ryaboy added a comment - hc, I haven't gone through the code, but I think we should just try to do the same thing for initial, intermediate, and final functions, in terms of instantiation, as is already being done for the regular EvalFunc, and construct a better FuncSpec. I doubt we need the handleChildConstructorParams function.
          Hide
          hc busy added a comment -

          But the problem is that inside the EvalFunc constructor, in case of Algebraic classes, it constructs each of Initial, Intermediate and final which are EvalFunc's that, in my case, require a parameter to operate correctly.

          If I declare the helper class that represent the initial/intermediate/final

          	public class HelperClass extends EvalFunc<Tuple> {
          		public HelperClass() {
          			super();
          		}
          
          		public Tuple exec(Tuple input) throws IOException {
          			return extreme(fieldIndex, sign, input, reporter);
          		}
          
          	}
          

          where the fieldIndex and sign come from the surrounding class (note the class is not static) then the code crashes. It's not able to construct the HelperClass with this error

          could not instantiate 'org.apache.pig.piggybank.evaluation.ExtremalTupleByNthField$HelperClass' with arguments 'null'
          java.lang.RuntimeException: could not instantiate 'org.apache.pig.piggybank.evaluation.ExtremalTupleByNthField$HelperClass' with arguments 'null'
          at org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:498)
          at org.apache.pig.EvalFunc.getReturnTypeFromSpec(EvalFunc.java:136)
          at org.apache.pig.EvalFunc.<init>(EvalFunc.java:123)
          at org.apache.pig.piggybank.evaluation.ExtremalTupleByNthField.<init>(ExtremalTupleByNthField.java:77)
          at org.apache.pig.piggybank.evaluation.TestExtremalTupleByNthField.testMin(Unknown Source)
          Caused by: java.lang.InstantiationException: org.apache.pig.piggybank.evaluation.ExtremalTupleByNthField$HelperClass
          at java.lang.Class.newInstance0(Class.java:340)
          at java.lang.Class.newInstance(Class.java:308)
          at org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:468)

          Basically, I think it's not able to construct because the class can only be constructed from an instance of ExtremalTupleByNthField.

                  ExtremalTupleByNthField etbnf = new ExtremalTupleByNthField("1","max");
                  etbnf.new ExtremalTupleByNthField.HelperClass();
          

          So my solution to this problem was to make this class static. But make it so that EvalFunc can take a vararg that will eventually contain the actual parameters.

          the handleChildConstructorParameters method in the EvalFunc will construct a string that represents the call into the initial/intermediate/final methods but it contains parameters that came from the ExtremalTupleByNthField.

          Show
          hc busy added a comment - But the problem is that inside the EvalFunc constructor, in case of Algebraic classes, it constructs each of Initial, Intermediate and final which are EvalFunc's that, in my case, require a parameter to operate correctly. If I declare the helper class that represent the initial/intermediate/final public class HelperClass extends EvalFunc<Tuple> { public HelperClass() { super (); } public Tuple exec(Tuple input) throws IOException { return extreme(fieldIndex, sign, input, reporter); } } where the fieldIndex and sign come from the surrounding class (note the class is not static) then the code crashes. It's not able to construct the HelperClass with this error could not instantiate 'org.apache.pig.piggybank.evaluation.ExtremalTupleByNthField$HelperClass' with arguments 'null' java.lang.RuntimeException: could not instantiate 'org.apache.pig.piggybank.evaluation.ExtremalTupleByNthField$HelperClass' with arguments 'null' at org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:498) at org.apache.pig.EvalFunc.getReturnTypeFromSpec(EvalFunc.java:136) at org.apache.pig.EvalFunc.<init>(EvalFunc.java:123) at org.apache.pig.piggybank.evaluation.ExtremalTupleByNthField.<init>(ExtremalTupleByNthField.java:77) at org.apache.pig.piggybank.evaluation.TestExtremalTupleByNthField.testMin(Unknown Source) Caused by: java.lang.InstantiationException: org.apache.pig.piggybank.evaluation.ExtremalTupleByNthField$HelperClass at java.lang.Class.newInstance0(Class.java:340) at java.lang.Class.newInstance(Class.java:308) at org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:468) Basically, I think it's not able to construct because the class can only be constructed from an instance of ExtremalTupleByNthField. ExtremalTupleByNthField etbnf = new ExtremalTupleByNthField( "1" , "max" ); etbnf. new ExtremalTupleByNthField.HelperClass(); So my solution to this problem was to make this class static. But make it so that EvalFunc can take a vararg that will eventually contain the actual parameters. the handleChildConstructorParameters method in the EvalFunc will construct a string that represents the call into the initial/intermediate/final methods but it contains parameters that came from the ExtremalTupleByNthField.
          Hide
          hc busy added a comment -

          Hmm, okay, so let me shorten my problem. Basically the functions

          getInitial, getIntermed, and getFinal in my Algebraic class doesn't have access to the constructor parameters. The reason is this. in Java, the super() constructor can only be called as the very first thing that the deriving class's constructor does, so my udfs has constructors that look like this:

          	 public ExtremalTupleByNthField(String fieldIndexString, String order) {
          		super();
                          parameters = "('"+fieldIndexString+"','"+order+"'";
                   }
                 @Override
          	public String getInitial() {
          		return HelperClass.class.getName()+parameters;
          	}
          

          But the problem is EvalFunc() constructor calls the child class's getInitial() to type check. When it does this, it finds that my getInitial() returns something in complete because the "parameters" member variable hasn't been initialized yet. This is a pretty mundane problem with java programs and the way to fix it is what I've submitted in the patch calling an overridden method in the super()'s constructor.

          I mean, I don't see any other way to do this, but I'd be willing to work on another implementation if you can suggest one?

          Show
          hc busy added a comment - Hmm, okay, so let me shorten my problem. Basically the functions getInitial, getIntermed, and getFinal in my Algebraic class doesn't have access to the constructor parameters. The reason is this. in Java, the super() constructor can only be called as the very first thing that the deriving class's constructor does, so my udfs has constructors that look like this: public ExtremalTupleByNthField( String fieldIndexString, String order) { super (); parameters = "('" +fieldIndexString+ "','" +order+ "'" ; } @Override public String getInitial() { return HelperClass.class.getName()+parameters; } But the problem is EvalFunc() constructor calls the child class's getInitial() to type check. When it does this, it finds that my getInitial() returns something in complete because the "parameters" member variable hasn't been initialized yet. This is a pretty mundane problem with java programs and the way to fix it is what I've submitted in the patch calling an overridden method in the super()'s constructor. I mean, I don't see any other way to do this, but I'd be willing to work on another implementation if you can suggest one?
          Hide
          Dmitriy V. Ryaboy added a comment -

          Right, I'm with you on what the problem is.

          My suggestion is to look at the FuncSpec of the parent EvalFunc, and if it includes parameters, assume an equivalent FuncSpec should be created for initial, intermediate, and final classes. Essentially do the same thing we do for regular EvalFuncs when they are DEFINEd to use parameterized constructors.

          You would then have to make sure that initial, intermediate, and final classes have the required constructors, but that's what you'd expect to have to do anyway.

          Show
          Dmitriy V. Ryaboy added a comment - Right, I'm with you on what the problem is. My suggestion is to look at the FuncSpec of the parent EvalFunc, and if it includes parameters, assume an equivalent FuncSpec should be created for initial, intermediate, and final classes. Essentially do the same thing we do for regular EvalFuncs when they are DEFINEd to use parameterized constructors. You would then have to make sure that initial, intermediate, and final classes have the required constructors, but that's what you'd expect to have to do anyway.
          Hide
          Johannes Rußek added a comment -

          I'm sorry, but is this really the problem with the DateExtractor UDF?
          what i've seen during debugging is that it implements getArgToFuncMapping which returns a funcspec without the constructor arguments. I've tried to look into it but i didn't find an easy way to figure the initial constructor arguments out when getArgToFuncMapping is being called.
          also, DateExtractor isn't even algebraic. i'm not sure, but what's wrong with keeping the constructor arguments from the first instantiation and apply them to any further ones, may it be algebraic child classes or pig type-specific implementations?

          Show
          Johannes Rußek added a comment - I'm sorry, but is this really the problem with the DateExtractor UDF? what i've seen during debugging is that it implements getArgToFuncMapping which returns a funcspec without the constructor arguments. I've tried to look into it but i didn't find an easy way to figure the initial constructor arguments out when getArgToFuncMapping is being called. also, DateExtractor isn't even algebraic. i'm not sure, but what's wrong with keeping the constructor arguments from the first instantiation and apply them to any further ones, may it be algebraic child classes or pig type-specific implementations?
          Hide
          Dmitriy V. Ryaboy added a comment -

          Johannes, it's not, it's just a related root cause (inconsistent use of UDF constructors)
          Keeping constructor arguments and applying them to all instantiations is exactly what I am advocating.

          Show
          Dmitriy V. Ryaboy added a comment - Johannes, it's not, it's just a related root cause (inconsistent use of UDF constructors) Keeping constructor arguments and applying them to all instantiations is exactly what I am advocating.
          Hide
          Johannes Rußek added a comment -

          alright, i got confused
          +1 then!

          Show
          Johannes Rußek added a comment - alright, i got confused +1 then!
          Dmitriy V. Ryaboy made changes -
          Attachment PIG-1303.patch [ 12442837 ]
          Hide
          Dmitriy V. Ryaboy added a comment -

          I think this qualifies as a bug fix and should be applied to 0.7 as well as going forward into 0.8

          Show
          Dmitriy V. Ryaboy added a comment - I think this qualifies as a bug fix and should be applied to 0.7 as well as going forward into 0.8
          Dmitriy V. Ryaboy made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Fix Version/s 0.7.0 [ 12314397 ]
          Fix Version/s 0.8.0 [ 12314562 ]
          Hide
          Dmitriy V. Ryaboy added a comment -

          Attached a patch for both regular UDFs and Algebraics. It's essentially what Johannes posted, I just added a test (but see the rest of this comment), and applied the same logic to Algebraic EvalFunc construction.

          The test only really tests algebraic instantiation; I haven't been successful in reproducing the normal instantiation problem in test mode (which is probably why it's gone undetected so far). There's a test for it in my patch, but it's a bad one – the test actually passes even without the patch to the TypeCheckingVisitor.

          Johannes, could you try applying this patch and let us know if it fixes your DateExtractor problem?

          hc, this patch should unblock you for PIG-1386

          Show
          Dmitriy V. Ryaboy added a comment - Attached a patch for both regular UDFs and Algebraics. It's essentially what Johannes posted, I just added a test (but see the rest of this comment), and applied the same logic to Algebraic EvalFunc construction. The test only really tests algebraic instantiation; I haven't been successful in reproducing the normal instantiation problem in test mode (which is probably why it's gone undetected so far). There's a test for it in my patch, but it's a bad one – the test actually passes even without the patch to the TypeCheckingVisitor. Johannes, could you try applying this patch and let us know if it fixes your DateExtractor problem? hc, this patch should unblock you for PIG-1386
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12442837/PIG-1303.patch
          against trunk revision 937570.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/302/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/302/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/302/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12442837/PIG-1303.patch against trunk revision 937570. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/302/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/302/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/302/console This message is automatically generated.
          Hide
          hc busy added a comment -

          +(google^2)

          that worked!

          Show
          hc busy added a comment - +(google^2) that worked!
          Hide
          Alan Gates added a comment -

          Dmitry, I'll try to get to reviewing this patch today.

          Show
          Alan Gates added a comment - Dmitry, I'll try to get to reviewing this patch today.
          Hide
          Alan Gates added a comment -

          Sorry, I didn't make it to reviewing this today. I'll put it at the top of tomorrow's list.

          On the 0.7 question, I'm open to that as long as we test it really well.

          Show
          Alan Gates added a comment - Sorry, I didn't make it to reviewing this today. I'll put it at the top of tomorrow's list. On the 0.7 question, I'm open to that as long as we test it really well.
          Hide
          Johannes Rußek added a comment -

          Dmitriy, i should be able to test your patch on my issues by tomorrow.

          Show
          Johannes Rußek added a comment - Dmitriy, i should be able to test your patch on my issues by tomorrow.
          Hide
          Alan Gates added a comment -

          In general code looks good. I think we should add a test for a function that uses the argsToFuncMapping mechanism since that's a totally separate code path from algebraic.

          Show
          Alan Gates added a comment - In general code looks good. I think we should add a test for a function that uses the argsToFuncMapping mechanism since that's a totally separate code path from algebraic.
          Hide
          Dmitriy V. Ryaboy added a comment -

          Agreed - but as you can see, I added a test like that (testRegularInstantiation), except it always passes, even without the patch. Even though theoretically it shouldn't. Can you suggest a legitimate way of testing this, barring an actual cluster test? Johannes, maybe you have a testcase that's reproducible when using the MiniCluster?

          Show
          Dmitriy V. Ryaboy added a comment - Agreed - but as you can see, I added a test like that (testRegularInstantiation), except it always passes, even without the patch. Even though theoretically it shouldn't. Can you suggest a legitimate way of testing this, barring an actual cluster test? Johannes, maybe you have a testcase that's reproducible when using the MiniCluster?
          Hide
          Alan Gates added a comment -

          I think the testing you've done is probably adequate. We're looking at trying to start the release process for 0.7 next week, so let's get this checked in so it can make it.

          Show
          Alan Gates added a comment - I think the testing you've done is probably adequate. We're looking at trying to start the release process for 0.7 next week, so let's get this checked in so it can make it.
          Hide
          Dmitriy V. Ryaboy added a comment -

          Committed to 0.7 and trunk. Thanks for the help, Johannes!

          Show
          Dmitriy V. Ryaboy added a comment - Committed to 0.7 and trunk. Thanks for the help, Johannes!
          Dmitriy V. Ryaboy made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Daniel Dai made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Thejas M Nair made changes -
          Link This issue is duplicated by PIG-969 [ PIG-969 ]

            People

            • Assignee:
              Dmitriy V. Ryaboy
              Reporter:
              Johannes Rußek
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development