I propose to allow user-defined functions (UDFs) to read from an initialization context during construction. The initialization context would be a new Java interface UdfInitializer that provides, among other things, a type factory and the values of the arguments to the function call whose values are literals.
The purpose of this feature is to allow functions to do more work at initialization time and less work on each invocation. Suppose I wanted to write a UDF regexMatch(pattern, string) that matches Java regular expressions. If pattern is a literal, I would like to create an instance of the function object that calls Pattern.compile(pattern) in its constructor and stores the resulting Pattern object as a field. Each invocation of the function can use that Pattern object, and does not have to pay the cost of compilation.
In order to use this feature, a UDF class would have a public constructor with a single argument that is a UdfInitializer. The method that invokes the function, conventionally called eval, must be non-static.
This feature is optional. A UDF that has a public constructor with zero arguments (which is the current contract for non-static UDFs) will continue to work. class MyPlusFunction is an example of this kind of UDF.
This feature would apply to all UDFs, including table functions (i.e. those whose argument are tables or which return tables) and aggregate functions.
The initialization context would not affect type derivation aspects of the function. The return type, operand types, and so forth, will already have been derived during validate time, and is complete well before any code is generated or executed. If you want to control type derivation, you should create your own sub-class of SqlOperator, as today.
There are some implementation challenges:
- The code generator will need to generate an instance of UdfInitializer for each UDF call that occurs in the query. Some data structures that are readily available at validate time (e.g. RexCall) are not easily re-created at run time, so we should be conservative what information is available via UdfInitializer.
- The code generator must ensure that those instances are constructed exactly once during the execution of the query; those instances should not be variables in the execute method, but should instead be fields, or perhaps static fields, in the generated class.
- This functionality needs to work through both the interpreter (Bindable convention) and generated code (Enumerable convention).