Uploaded image for project: 'Apache Trafodion (Retired)'
  1. Apache Trafodion (Retired)
  2. TRAFODION-2912

Non-deterministic scalar UDFs not executed once per row

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 2.0-incubating
    • 2.3
    • sql-cmp
    • None

    Description

      This problem was found by Andy Yang.

      Andy created a random generator scalar UDF and found that it did not return a different random value for each row:

      >>select scalar_rand_udf(), scalar_rand_udf()
      +>from (values (1), (2), (3)) T(s);
      
      RND RND 
      ----------- -----------
      
      846930886 1804289383
      846930886 1804289383
      846930886 1804289383
      
      --- 3 row(s) selected.
      >>
      

      Here is the explain, it shows that we are using hash joins, not nested joins, to evaluate the UDFs:

      >>explain options 'f' s;
      
      LC   RC   OP   OPERATOR              OPT       DESCRIPTION           CARD
      ---- ---- ---- --------------------  --------  --------------------  ---------
      
      5    .    6    root                                                  3.00E+000
      4    1    5    hybrid_hash_join                                      3.00E+000
      3    2    4    hybrid_hash_join                                      1.00E+000
      .    .    3    isolated_scalar_udf             SCALAR_RAND_UDF       1.00E+000
      .    .    2    isolated_scalar_udf             SCALAR_RAND_UDF       1.00E+000
      .    .    1    tuplelist                                             3.00E+000
      
      --- SQL operation complete.
      >>
      

      The problem is that we don't check for non-deterministic UDFs when we transform a TSJ to a regular join in the transformer or normalizer. We don't even set the non-deterministic flag in the group attributes of the IsolatedScalarUDF node.

      The fix is to set this flag correctly and to add a check and not transform routine joins for non-deterministic isolated scalar UDFs into a regular join.

      To recreate:

      Here is the source code of the UDF:

      #include "sqludr.h"
      #include <stdlib.h>
      
      SQLUDR_LIBFUNC SQLUDR_INT32 scalar_rand_udf(SQLUDR_INT32 *out1,
                                                  SQLUDR_INT16 *outInd1,
                                                  SQLUDR_TRAIL_ARGS)
      {
        if (calltype == SQLUDR_CALLTYPE_FINAL)
          return SQLUDR_SUCCESS;
      
        (*out1) = rand();
      
        return SQLUDR_SUCCESS;
      }
      

      Compile the UDF:

      gcc -g -Wall -I$TRAF_HOME/export/include/sql -shared -fPIC -o scalar_rand_udf.so scalar_rand_udf.c
      

      Create the UDF and run it:

      drop function scalar_rand_udf;
      drop library scalar_rand_udf_lib;
      
      create library scalar_rand_udf_lib
       file '/home/zellerh/src/scalar_rand_udf/scalar_rand_udf.so';
      
      create function scalar_rand_udf() returns (rnd int)
        external name 'scalar_rand_udf' library scalar_rand_udf_lib
        not deterministic no sql no transaction required;
      
      prepare s from
      select scalar_rand_udf(), scalar_rand_udf()
      from (values (1), (2), (3)) T(s);
      
      explain options 'f' s;
      execute s;
      

      Attachments

        Issue Links

          Activity

            People

              hzeller Hans Zeller
              hzeller Hans Zeller
              Votes:
              0 Vote for this issue
              Watchers:
              2 Stop watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Issue deployment