|
Can you add a test case, Tom?
For adding a test case, please add a "my.q" file in hive/ql/src/test/queries/positive, and then run "ant test -Dtestcase=TestParse -Dqfile=my.q", collect the standard result by searching for the failed "diff" command in the test log. Another thing is that do you think AdminWork will be responsible for other works besides creating UDF? if not then let's call it a more specific name (like CreateUDFWork). Hi Tom,
The reason for that code was that user could specify the list of auxiliary jars in hive-default.xml (and also by setting the hive.aux.jars.path in hive command line client as a session variable). In which case GenericOptionsParser code doesn't work. I think appending 'file://' is new to 19. Can you persist this mapping in metastore? otherwise create needs to be specified everytime hive session is brought up. And also management functions such as drop, rename and some such needs to be provided. Another alternative (as an initial implementation which is simple but sufficient for now) is to use reflection to load the function class. The function name and class name can be same and the package could be predefined (org.apache.hadoop.hive.udf) or configured and at runtime this class would be loaded. cool...
Actually, could this be made a part of DDLWork instead of AdminWork and then it can all be launched from the DDLTask. We have different DDL descriptors in DDL Work for things like create/drop/alter tables and the functions could also belong to that. Having a DDLFunctionsWork is also fine. I do think we should be more specific than admin. Also there is this whole thing of making this mapping persistent(both Raghu and Prasad were mentioning that) so that they can be reused across sessions. Maybe we can have that discussion here as well. Thanks for all your feedback. I'm working on a test case.
The reason I called it AdminWork was because I was following the terminology used in MySQL. "DDLWork" and "DDLFunctionsWork" don't sound right as function creation is not really a part of DDL. I can rename it something more specific - CreateFunctionWork? I agree that it would be good to make the mapping persistent, and add drop and rename, but these can be done in later Jiras.
Not sure about that, but You are correct that we do need to preppend file://
Not exactly true about treating create/rename/alter functions as part of the DDL. Most DBs bin them in that bucket. Case to point is the mysql doc (link below) which clearly lists create function in the DDL section.
http://dev.mysql.com/doc/refman/5.0/en/sql-syntax-data-definition.html I am not too hung up on this though. FunctionWork(instead of CreateFunctionWork) is also fine with me - though I would like to have that bin all the different descriptors for manipulating functions rather than create a different work class for each manipulation (create/alter/rename/drop etc...) We can start of this with being non persistent but then in the syntax it should be made explicit that this is non persistent. One proposal could be CREATE TEMPORARY FUNCTION blah AS blah Also for temporary functions, reflection is also another way to structure this as Prasad pointed out but then that would not work if we allow non Java functions, so I think having syntax is useful, though we could optimize the java case to the point that all it means to create a new function is to drop in a jar which implements the udf interfaces (including something like a displayName() interface which would allow the language name to be different from the implementation name. Here's an updated patch that addresses some of the comments.
+1
Looks good. Thanks for the contribution. Only comment that I have is that in the patch you have duplicated the code for compile-test in ql/build.xml Would be better if instead we introduced a new target for building the udf jar (on similar lines as gen-test) and have a dummy gen-test-jar in build-common.xml which gets overwritten in ql/build.xml... Its upto you if you want to incorporate that... But this is good as is... Thanks again for the contribution... keep them rolling +1
This diff probably conflicts with Ashish's diff on explain (which merged UDFRegistry and UDAFRegistry). We should find a way to merge them. +1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12390274/hadoop-4070-v2.patch against trunk revision 696551. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 12 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3294/testReport/ This message is automatically generated. +1
Looks good. As Zheng points out, this will conflict with my changes, but we can work that out at commit time. The conflict will be on line 45 in FunctionTask.java - instead of UDFRegistry we have FunctionRegistry now.
We can fix explain plan for this later... Cancelling as this is going to not compile now since Dhrubha has already checked in my txn of explain plan.
Resolved conflict that Tom's patch had with explain plan and added an explain plan test.
Tom I hope you don't mind me doing this as otherwise I think this may not make it for tomorrows 0.19 code freeze resolved conflicts with my explain plan transaction.
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12390402/hadoop-4070-v4.patch against trunk revision 696846. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 12 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3315/testReport/ This message is automatically generated. Integrated in Hadoop-trunk #611 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/611/
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
For example you can do
then use the my_udf function just like any other function.
The class defining the UDF (com.example.MyUdf) should be packed in a jar file then made available to hive by setting the HIVE_AUX_JARS_PATH environment variable (which specifies the directory that the jar is in).
Note that there is a change in ExecDriver which removes some code that was setting the "tmpjars" property incorrectly. The problem was that the paths were not have the "file://" scheme prepended to them as GenericOptionsParser does, which caused the paths to be interpreted as HDFS paths. By removing the code GenericOptionsParser sets the variable correctly.
I have tested it manually on a simple UDF, the next step is to write some automated tests.