Derby
  1. Derby
  2. DERBY-5901

You can declare user-defined functions which shadow builtin functions by the same name.

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 10.10.1.1
    • Fix Version/s: None
    • Component/s: SQL
    • Urgency:
      Normal
    • Issue & fix info:
      Repro attached
    • Bug behavior facts:
      Deviation from standard, Wrong query result

      Description

      You can override a Derby builtin function by creating a function with the same name. This can give rise to wrong results.

      Consider the following user code:

      public class FakeSin
      {
      public static Double sin( Double input )

      { return new Double( 3.0 ); }

      }

      Now run the following script:

      connect 'jdbc:derby:memory:db;create=true';

      values sin( 0.5 );

      create function sin( a double ) returns double language java parameter style java no sql external name 'FakeSin.sin';

      values sin( 0.5 );
      values sin( 0.5 );

      Note the following:

      1) The first invocation of sin() returns the expected result.

      2) You are allowed to create a user-defined function named "sin" which can shadow the builtin function.

      3) The second invocation of sin() returns the result of running the builtin function. This is because the second invocation is character-for-character identical to the first, so Derby just uses the previously prepared statement.

      4) But the third invocation of sin() returns the result of running the user-defined function. Note that the third invocation has an extra space in it, which causes Derby to compile it from scratch, picking up the user-defined function instead of the builtin one.

        Issue Links

          Activity

          Rick Hillegas created issue -
          Hide
          Rick Hillegas added a comment -

          Behavior is different if you try to override one of the SQL Standard operators like abs. Consider the following script:

          connect 'jdbc:derby:memory:db;create=true';

          create function abs( a double ) returns double language java parameter style java no sql external name 'FakeSin.sin';

          values abs( 0.5 );
          values app.abs( 0.5 );

          The first invocation of abs() returns the result of the builtin operator. The second returns the result of the user-defined function.

          Show
          Rick Hillegas added a comment - Behavior is different if you try to override one of the SQL Standard operators like abs. Consider the following script: connect 'jdbc:derby:memory:db;create=true'; create function abs( a double ) returns double language java parameter style java no sql external name 'FakeSin.sin'; values abs( 0.5 ); values app.abs( 0.5 ); The first invocation of abs() returns the result of the builtin operator. The second returns the result of the user-defined function.
          Hide
          Rick Hillegas added a comment -

          Marking this issue as "deviation from standard". The SQL Standard addresses the ABS case by making ABS a reserved keyword. That is, if Derby followed the Standard in this matter, then you would not be allowed to declare a user-defined function with the name ABS. The ABS case arises in Derby because Derby treats ABS as a non-reserved keyword. Changing ABS to a reserved keyword would create backward-compatibility problems.

          Show
          Rick Hillegas added a comment - Marking this issue as "deviation from standard". The SQL Standard addresses the ABS case by making ABS a reserved keyword. That is, if Derby followed the Standard in this matter, then you would not be allowed to declare a user-defined function with the name ABS. The ABS case arises in Derby because Derby treats ABS as a non-reserved keyword. Changing ABS to a reserved keyword would create backward-compatibility problems.
          Rick Hillegas made changes -
          Field Original Value New Value
          Bug behavior facts Wrong query result [ 10366 ] Deviation from standard,Wrong query result [ 10367, 10366 ]
          Hide
          Rick Hillegas added a comment -

          Here are some suggestions for how to address this issue:

          1) We could make it illegal to declare a function whose unqualifed name would conflict with the name of a builtin Derby function or operator. Existing functions which violate this rule could be marked as invalid.

          2) We could make all of these user-defined functions behave like the ABS case: Without a schema qualifier, the name would resolve to the builtin function or operator. To specify the user-defined function, you would have to qualify it with its schema name.

          Here is my analysis of how these solutions would affect existing applications:

          (1)

          + Existing code which invokes the user-defined functions would flat-out break, forcing the application writer to fix the problem.

          • Fixing the problem would be a two part process: i) drop the offending function and recreate it with a non-conflicting name, then
            ii) update existing invocations of the function across the application.

          (2)

          • Existing code which invokes the user-defined function might silently start using the builtin function/operator rather than the user-defined function. This could produce wrong results.
            + Fixing the problem would just be a matter of adding a schema qualification to existing invocations.

          Preferences? Other ideas? Thanks.

          Show
          Rick Hillegas added a comment - Here are some suggestions for how to address this issue: 1) We could make it illegal to declare a function whose unqualifed name would conflict with the name of a builtin Derby function or operator. Existing functions which violate this rule could be marked as invalid. 2) We could make all of these user-defined functions behave like the ABS case: Without a schema qualifier, the name would resolve to the builtin function or operator. To specify the user-defined function, you would have to qualify it with its schema name. Here is my analysis of how these solutions would affect existing applications: (1) + Existing code which invokes the user-defined functions would flat-out break, forcing the application writer to fix the problem. Fixing the problem would be a two part process: i) drop the offending function and recreate it with a non-conflicting name, then ii) update existing invocations of the function across the application. (2) Existing code which invokes the user-defined function might silently start using the builtin function/operator rather than the user-defined function. This could produce wrong results. + Fixing the problem would just be a matter of adding a schema qualification to existing invocations. Preferences? Other ideas? Thanks.
          Rick Hillegas made changes -
          Link This issue is related to DERBY-672 [ DERBY-672 ]
          Hide
          Knut Anders Hatlen added a comment -

          I think I'd prefer option (1). The compatibility issues sound more manageable with that option (a failure is easier to detect than silently changing the results). Also, option (2) opens for the possibility that you can create a function which is shadowed by a built-in function, and if you don't know that there is a built-in with the same name, you may unknowingly call the wrong method in your SQL queries.

          Show
          Knut Anders Hatlen added a comment - I think I'd prefer option (1). The compatibility issues sound more manageable with that option (a failure is easier to detect than silently changing the results). Also, option (2) opens for the possibility that you can create a function which is shadowed by a built-in function, and if you don't know that there is a built-in with the same name, you may unknowingly call the wrong method in your SQL queries.
          Hide
          Kathey Marsden added a comment -

          I think the most likely trouble for existing applications with option 1 would be if someone implemented a function before it was added as a builtin function in Derby later. For example these added in 10.3:
          https://issues.apache.org/jira/browse/DERBY-1808

          Similarly if an application written against 10.9 creates a function by the same name as a builtin function that is added sometime in the future, it would also break when upgraded to the new derby version which has the new builtin function.

          I am not sure what the correct answer is but wonder if the standard speaks to this point wrt non-reserved words like SIN and what other database products do. I am concerned about introducing an incompatibility that is not specific to one release but has the potential to create an incompatibility for every function added at a time when the likely affected applications are quite old and possibly don't have developers ready to investigate and fix such an issue.

          Show
          Kathey Marsden added a comment - I think the most likely trouble for existing applications with option 1 would be if someone implemented a function before it was added as a builtin function in Derby later. For example these added in 10.3: https://issues.apache.org/jira/browse/DERBY-1808 Similarly if an application written against 10.9 creates a function by the same name as a builtin function that is added sometime in the future, it would also break when upgraded to the new derby version which has the new builtin function. I am not sure what the correct answer is but wonder if the standard speaks to this point wrt non-reserved words like SIN and what other database products do. I am concerned about introducing an incompatibility that is not specific to one release but has the potential to create an incompatibility for every function added at a time when the likely affected applications are quite old and possibly don't have developers ready to investigate and fix such an issue.
          Hide
          Rick Hillegas added a comment -

          The Standard only addresses the situation where the operator/function name is a reserved word. For instance, DATE is a reserved word under the Standard and so an application should not be allowed to create a function called DATE--Derby allows this and this creates a conflict with the builtin Derby DATE operator.

          The Standard is silent on operator/function names which are not reserved words. So, for instance, the Standard has nothing to say about whether an application can create a function called ABS or SIN. Note that these are both the names of JDBC escape functions. We would not have a problem with these operator names if Derby only invoked them via JDBC escape syntax.

          But like other RDBMSes, Derby can declare its own operators on top of those defined by the Standard. As we go forward, we should be careful to not enlarge the problem described by this JIRA. That is, if we add new operators, we should make sure that they cannot collide with the names of application-defined functions.

          I don't see how to address this issue without introducing a backward incompatibility. Regardless of the approach we take, we could add a scrap of upgrade logic which looks for conflicting application-defined function names. If we find a conflict, we could fail the upgrade with a message advising the user that the conflict needs to be fixed first.

          Thanks,
          -Rick

          Show
          Rick Hillegas added a comment - The Standard only addresses the situation where the operator/function name is a reserved word. For instance, DATE is a reserved word under the Standard and so an application should not be allowed to create a function called DATE--Derby allows this and this creates a conflict with the builtin Derby DATE operator. The Standard is silent on operator/function names which are not reserved words. So, for instance, the Standard has nothing to say about whether an application can create a function called ABS or SIN. Note that these are both the names of JDBC escape functions. We would not have a problem with these operator names if Derby only invoked them via JDBC escape syntax. But like other RDBMSes, Derby can declare its own operators on top of those defined by the Standard. As we go forward, we should be careful to not enlarge the problem described by this JIRA. That is, if we add new operators, we should make sure that they cannot collide with the names of application-defined functions. I don't see how to address this issue without introducing a backward incompatibility. Regardless of the approach we take, we could add a scrap of upgrade logic which looks for conflicting application-defined function names. If we find a conflict, we could fail the upgrade with a message advising the user that the conflict needs to be fixed first. Thanks, -Rick
          Knut Anders Hatlen made changes -
          Labels derby_triage10_10
          Urgency Normal [ 10052 ]
          Gavin made changes -
          Workflow jira [ 12721064 ] Default workflow, editable Closed status [ 12801950 ]
          Hide
          Knut Anders Hatlen added a comment -

          The functions that can be shadowed live in the SYSFUN schema and can be invoked with a qualified name, like this:

          values sysfun.sin(0.5);

          See the discussion about SQL-path in DERBY-6362. I suppose Derby's current behaviour is mostly consistent with what you'd expect if CURRENT_PATH = (CURRENT_SCHEMA, SYSFUN). What does not seem to be as expected, I think, is that

          values sin(0.5)

          has different behaviour depending on whether it is actually compiled or just fetched from the internal statement cache. The presence of an internal statement cache should be invisible to the applications. The resolution of the unqualified function name should happen when the application prepares the statement and not be influenced by identical-looking, previously compiled statements.

          Show
          Knut Anders Hatlen added a comment - The functions that can be shadowed live in the SYSFUN schema and can be invoked with a qualified name, like this: values sysfun.sin(0.5); See the discussion about SQL-path in DERBY-6362 . I suppose Derby's current behaviour is mostly consistent with what you'd expect if CURRENT_PATH = (CURRENT_SCHEMA, SYSFUN). What does not seem to be as expected, I think, is that values sin(0.5) has different behaviour depending on whether it is actually compiled or just fetched from the internal statement cache. The presence of an internal statement cache should be invisible to the applications. The resolution of the unqualified function name should happen when the application prepares the statement and not be influenced by identical-looking, previously compiled statements.
          Hide
          Dag H. Wanvik added a comment -

          +1 to that analysis. Should we store the statements in the cache using fully qualified names?

          Show
          Dag H. Wanvik added a comment - +1 to that analysis. Should we store the statements in the cache using fully qualified names?
          Hide
          Knut Anders Hatlen added a comment -

          I'm not sure. Normalizing statements before storing them in the cache would probably mean that we'd have to run them through parsing and binding, which is what the cache is there to avoid in the first place. Perhaps we could find a way to have CREATE FUNCTION evict affected statements from the cache if a function that shadows SYSFUN is created. Then we'd limit the performance hit to CREATE FUNCTION time and avoid the extra cost on each cache lookup.

          One rather broad fix, that I think is possible in the current code without adding any new dependency tracking or parsing, is to make CREATE FUNCTION invalidate all statements in the cache whose original compilation schema is the same as the schema of the shadowing function. It is broader than it needs to be, since we only need to invalidate those statements that have unqualified invocations of the affected function. But if it is limited to only those CREATE FUNCTION calls that actually create a shadowing function, it might be an acceptable price to pay, and better than building something complex in order to improve the performance of the statement cache in some rare edge cases. (Maybe these cases are so rare that it would be acceptable to make it even simpler: clear the entire statement cache each time a shadowing function is created.)

          Show
          Knut Anders Hatlen added a comment - I'm not sure. Normalizing statements before storing them in the cache would probably mean that we'd have to run them through parsing and binding, which is what the cache is there to avoid in the first place. Perhaps we could find a way to have CREATE FUNCTION evict affected statements from the cache if a function that shadows SYSFUN is created. Then we'd limit the performance hit to CREATE FUNCTION time and avoid the extra cost on each cache lookup. One rather broad fix, that I think is possible in the current code without adding any new dependency tracking or parsing, is to make CREATE FUNCTION invalidate all statements in the cache whose original compilation schema is the same as the schema of the shadowing function. It is broader than it needs to be, since we only need to invalidate those statements that have unqualified invocations of the affected function. But if it is limited to only those CREATE FUNCTION calls that actually create a shadowing function, it might be an acceptable price to pay, and better than building something complex in order to improve the performance of the statement cache in some rare edge cases. (Maybe these cases are so rare that it would be acceptable to make it even simpler: clear the entire statement cache each time a shadowing function is created.)
          Hide
          Rick Hillegas added a comment -

          Emptying the statement cache when a shadowing function is created sounds like a cheap, adequate fix for this rare edge-case. Thanks.

          Show
          Rick Hillegas added a comment - Emptying the statement cache when a shadowing function is created sounds like a cheap, adequate fix for this rare edge-case. Thanks.
          Hide
          Knut Anders Hatlen added a comment -

          There's also the question what to do with the ambiguous function names in triggers, views, generation clauses and similar, where we store the original SQL text in system tables. Currently, they will resolve to whatever was the correct function at some ill-defined point in time (such as the last time the trigger SPS was recompiled). They should resolve to whatever was the correct function at definition time.

          DERBY-6362 has some code attached to it, which replaces unqualified identifiers with schema-qualified identifiers in CHECK constraints before storing them. Although it's not the primary purpose of DERBY-6362, it has the nice side effect that CHECK constraints will preserve the schema that the function resolved to at the time the constraint was created. I think that code is general enough to be used in triggers, views and generation clauses as well.

          Show
          Knut Anders Hatlen added a comment - There's also the question what to do with the ambiguous function names in triggers, views, generation clauses and similar, where we store the original SQL text in system tables. Currently, they will resolve to whatever was the correct function at some ill-defined point in time (such as the last time the trigger SPS was recompiled). They should resolve to whatever was the correct function at definition time. DERBY-6362 has some code attached to it, which replaces unqualified identifiers with schema-qualified identifiers in CHECK constraints before storing them. Although it's not the primary purpose of DERBY-6362 , it has the nice side effect that CHECK constraints will preserve the schema that the function resolved to at the time the constraint was created. I think that code is general enough to be used in triggers, views and generation clauses as well.
          Hide
          Dag H. Wanvik added a comment -

          "storing them" : does that mean expanding the SQL text, or storing dependencies of the text in expanded form?

          Show
          Dag H. Wanvik added a comment - "storing them" : does that mean expanding the SQL text, or storing dependencies of the text in expanded form?
          Hide
          Knut Anders Hatlen added a comment -

          In the approach used in DERBY-6362, it means that the SQL text of a CHECK constraint is expanded before it is put into a column in SYS.SYSCHECKS. That is, if you execute "create table(x int, check(x < sin(pi())))", it would previously store 'check(x < sin(pi()))' in SYS.SYSCHECKS.CHECKDEFINITION, and now it will store 'check(x < "SYSFUN"."SIN"("SYSFUN"."PI"()))' instead.

          We could do the same with the SQL text that we put into SYS.SYSTRIGGERS.TRIGGERDEFINITION, SYS.SYSTRIGGERS.WHENCLAUSETEXT, SYS.SYSVIEWS.VIEWDEFINITION and SYS.SYSCOLUMNS.COLUMNDEFAULT. As an extra bonus, it will also solve DERBY-6370 for us.

          Show
          Knut Anders Hatlen added a comment - In the approach used in DERBY-6362 , it means that the SQL text of a CHECK constraint is expanded before it is put into a column in SYS.SYSCHECKS. That is, if you execute "create table(x int, check(x < sin(pi())))", it would previously store 'check(x < sin(pi()))' in SYS.SYSCHECKS.CHECKDEFINITION, and now it will store 'check(x < "SYSFUN"."SIN"("SYSFUN"."PI"()))' instead. We could do the same with the SQL text that we put into SYS.SYSTRIGGERS.TRIGGERDEFINITION, SYS.SYSTRIGGERS.WHENCLAUSETEXT, SYS.SYSVIEWS.VIEWDEFINITION and SYS.SYSCOLUMNS.COLUMNDEFAULT. As an extra bonus, it will also solve DERBY-6370 for us.
          Hide
          Dag H. Wanvik added a comment -

          Sounds good to me, +1.

          Show
          Dag H. Wanvik added a comment - Sounds good to me, +1.
          Hide
          Knut Anders Hatlen added a comment -

          One case that won't be handled by clearing the cache, is when an internal recompilation of an already prepared statement happens. Take for example this ij session:

          ij> create table t(x int);
          0 rows inserted/updated/deleted
          ij> insert into t values 1;
          1 row inserted/updated/deleted
          ij> prepare ps as 'select sin(x) from t';
          ij> execute ps;
          1                       
          ------------------------
          0.8414709848078965      
          
          1 row selected
          ij> create function sin(x int) returns int language java parameter style java external name 'java.lang.Math.abs';
          0 rows inserted/updated/deleted
          ij> execute ps;
          1                       
          ------------------------
          0.8414709848078965      
          
          1 row selected
          ij> call syscs_util.syscs_compress_table('APP', 'T', 0);
          0 rows inserted/updated/deleted
          ij> execute ps;
          1          
          -----------
          1          
          
          1 row selected
          

          Here, I think the value returned by ps should not change after the call to syscs_compress_table.

          Show
          Knut Anders Hatlen added a comment - One case that won't be handled by clearing the cache, is when an internal recompilation of an already prepared statement happens. Take for example this ij session: ij> create table t(x int); 0 rows inserted/updated/deleted ij> insert into t values 1; 1 row inserted/updated/deleted ij> prepare ps as 'select sin(x) from t'; ij> execute ps; 1 ------------------------ 0.8414709848078965 1 row selected ij> create function sin(x int) returns int language java parameter style java external name 'java.lang.Math.abs'; 0 rows inserted/updated/deleted ij> execute ps; 1 ------------------------ 0.8414709848078965 1 row selected ij> call syscs_util.syscs_compress_table('APP', 'T', 0); 0 rows inserted/updated/deleted ij> execute ps; 1 ----------- 1 1 row selected Here, I think the value returned by ps should not change after the call to syscs_compress_table.
          Knut Anders Hatlen made changes -
          Link This issue is related to DERBY-6370 [ DERBY-6370 ]

            People

            • Assignee:
              Unassigned
              Reporter:
              Rick Hillegas
            • Votes:
              1 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:

                Development