Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.16.0
    • 1.17.0
    • None

    Description

      The aim of this Jira is to add support for vararg UDFs to simplify UDFs creation for the case when it is required to accept different numbers of arguments.

      Requirements for vararg UDFs:

      • It should be possible to register vararg UDFs with the same name, but with different argument types;
      • Only vararg UDFs with a single variable-length argument placed after all other arguments should be allowed;
      • Vararg UDF should have less priority than the regular one for the case when they both are suitable;
      • Besides simple functions, vararg support should be added to the aggregate functions.

      Implementation details

      The lifecycle of UDF is the following:

      • UDF is validated in FunctionConverter class and for the case when there is no problem (UDF has required fields with required types, required annotations, etc.), it is converted to the DrillFuncHolder to be registered in the function registry. Also, corresponding SqlFunction instances are created based on DrillFuncHolder to be used in Calcite;
      • When a query uses this UDF, Calcite validate that UDF with required name, arguments number and arguments types (for Drill arguments types are not checked at this stage) exists;
      • After Calcite was able to find the required SqlFunction instance, it uses Drill to find required DrillFuncHolder. All the work for determining the most suitable function is done in FunctionResolver and in TypeCastRules.getCost();
      • At the execution stage, DrillFuncHolder found again using FunctionCall instance;
      • DrillFuncHolder is used for code generation.

      Considering these steps, the first thing to be done for adding support for vararg UDFs is updating logic in FunctionConverter to allow registering vararg UDFs taking into account requirements declared above.

      Calcite uses SqlOperandTypeChecker to verify arguments number, so Drill should provide its own for vararg UDFs to be able to use them. To determine whether UDF is vararg, new isVarArg property will be added to the FunctionTemplate.

      TypeCastRules.getCost() method should be updated to be able to find vararg UDFs and prioritize regular UDFs.

      Code generation logic should be updated to handle vararg UDFs. Generated code for varag argument will look in the following way:

                        NullableVarCharHolder[] inputs = new NullableVarCharHolder[3];
                        inputs[0] = out14;
                        inputs[1] = out19;
                        inputs[2] = out24;
      

      To create own varagr UDF, new isVarArg property should be set to true in FunctionTemplate.
      After that, required vararg input should be declared as an array.

      Here is an example if vararg UDF:

        @FunctionTemplate(name = "concat_varchar",
                          isVarArg = true,
                          scope = FunctionTemplate.FunctionScope.SIMPLE)
        public class VarCharConcatFunction implements DrillSimpleFunc {
          @Param *VarCharHolder[] inputs*;
          @Output VarCharHolder out;
          @Inject DrillBuf buffer;
       
           @Override
          public void setup() {
          }
      
           @Override
          public void eval() {
            int length = 0;
            for (VarCharHolder input : inputs) {
              length += input.end - input.start;
            }
             out.buffer = buffer = buffer.reallocIfNeeded(length);
            out.start = out.end = 0;
             for (VarCharHolder input : inputs) {
              for (int id = input.start; id < input.end; id++) {
                out.buffer.setByte(out.end++, input.buffer.getByte(id));
              }
            }
          }
        }
      

      Limitations connected with VarArg UDFs:

      • Specified nulls handling in FunctionTemplate does not affect vararg parameters, i.e. the user should add UDFs with non-nullable and nullable value holder vararg fields;
      • VarArg UDFs supports only values of the same type including nullability for vararg arguments for value holder vararg fields. If vararg field is FieldReader, all the responsibility for handling types and nullability of input vararg fields is placed on the UDF implementation;
      • The scalar replacement does not happen for vararg arguments;
      • UDF implementation should consider the case when vararg field is empty.

      For documentation
      New functions: collect_to_list, TBA.

      Attachments

        Issue Links

          Activity

            People

              volodymyr Vova Vysotskyi
              volodymyr Vova Vysotskyi
              Arina Ielchiieva Arina Ielchiieva
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: