Derby
  1. Derby
  2. DERBY-4357

TableFunctions provide no information to limit underlying query

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 10.4.1.3, 10.4.2.0, 10.5.1.1, 10.5.2.0, 10.5.3.0
    • Fix Version/s: 10.6.1.0
    • Component/s: SQL
    • Labels:
      None
    • Environment:
      ALL
    • Urgency:
      Normal
    • Issue & fix info:
      Newcomer, Patch Available
    • Bug behavior facts:
      Performance

      Description

      The API specification for TableFunctions cannot provide information to the implementer of the TableFunction about the details of the query. For example:

      (a) I defined a table function named MyFunction with columns a,b, & c
      (b) I bind the table function properly using the CREATE FUNCTION SQL.

      User executes the following SQL:

      select a,b from table ( MyFunction() ) where c = 123

      Without passing the column list and/or where clause as arguments to the table function, my implementation can not know that it only needs two of the three columns, and only rows where c = 123.

      For TableFunctions that are built to integrate distant/legacy data, the cost of the query can be prohibitive. It would be better if information regarding the columns in the select and restrictions from the where clause could be passed to the developer.

      1. derby-4357-01-aa-publicAPI.diff
        14 kB
        Rick Hillegas
      2. derby-4357-02-ac-passThrough.diff
        48 kB
        Rick Hillegas
      3. derby-4357-02-ad-passThrough.diff
        56 kB
        Rick Hillegas
      4. derby-4357-03-aa-hashjoin.diff
        16 kB
        Rick Hillegas
      5. derby-4357-04-aa-moreTests.diff
        2 kB
        Rick Hillegas
      6. RestrictedTableFunctions.html
        80 kB
        Rick Hillegas
      7. RestrictedTableFunctions.html
        73 kB
        Rick Hillegas
      8. RestrictedTableFunctions.html
        73 kB
        Rick Hillegas
      9. RestrictedTableFunctions.html
        71 kB
        Rick Hillegas
      10. RestrictedTableFunctions.html
        68 kB
        Rick Hillegas

        Issue Links

          Activity

          Hide
          Knut Anders Hatlen added a comment -

          [bulk update] Close all resolved issues that haven't been updated for more than one year.

          Show
          Knut Anders Hatlen added a comment - [bulk update] Close all resolved issues that haven't been updated for more than one year.
          Hide
          Rick Hillegas added a comment -

          Attaching derby-4357-04-aa-moreTests.diff, which adds some more tests for restricted vtis. Committed at subversion revision 832423.

          Show
          Rick Hillegas added a comment - Attaching derby-4357-04-aa-moreTests.diff, which adds some more tests for restricted vtis. Committed at subversion revision 832423.
          Hide
          Rick Hillegas added a comment -

          Thanks, Mamta. Looks good.

          Show
          Rick Hillegas added a comment - Thanks, Mamta. Looks good.
          Hide
          Mamta A. Satoor added a comment -

          Fixed a very minor javadoc error in FromVTI.java for method private HashMap computeProjection( ) throws StandardException

          Rick, I am not sure though if the comments for that function need to be changed. Maybe you can take a look at that. thanks.

          Show
          Mamta A. Satoor added a comment - Fixed a very minor javadoc error in FromVTI.java for method private HashMap computeProjection( ) throws StandardException Rick, I am not sure though if the comments for that function need to be changed. Maybe you can take a look at that. thanks.
          Hide
          Rick Hillegas added a comment -

          Tests passed cleanly for me. Committed derby-4357-03-aa-hashjoin.diff at subversion revision 831454.

          Show
          Rick Hillegas added a comment - Tests passed cleanly for me. Committed derby-4357-03-aa-hashjoin.diff at subversion revision 831454.
          Hide
          Rick Hillegas added a comment -

          Attaching derby-4357-03-aa-hashjoin.diff. This patch adds support for hashjoins plus some miscellaneous cleanup.

          1) It turns out that when the optimizer puts a hash table on top of a VTI, the optimizer does not fabricate a ProjectRestrict node. This patch adds logic to the hash table node so it pushes its search restriction into the VTI just as the ProjectRestrict node does.

          2) It also turns out that the VTI node has enough information to compute the projection by itself. Made the computation of the projection more robust by having the VTI compute the projection itself in cases where the parent did not request this service.

          Touches the following files:

          M java/engine/org/apache/derby/impl/sql/compile/Predicate.java

          Miscellaneous cleanup of the node-printing logic.

          M java/engine/org/apache/derby/impl/sql/compile/HashTableNode.java

          Logic supporting (1).

          M java/engine/org/apache/derby/impl/sql/compile/FromVTI.java

          Logic supporting (2).

          M java/engine/org/apache/derby/impl/sql/compile/ProjectRestrictNode.java

          Moved up the logic which pushes the VTI projection/restriction so that it is parallel to what is done in HashTableNode.

          M java/testing/org/apache/derbyTesting/functionTests/tests/lang/IntegerArrayVTI.java
          M java/testing/org/apache/derbyTesting/functionTests/tests/lang/RestrictedVTITest.java

          More tests.

          Show
          Rick Hillegas added a comment - Attaching derby-4357-03-aa-hashjoin.diff. This patch adds support for hashjoins plus some miscellaneous cleanup. 1) It turns out that when the optimizer puts a hash table on top of a VTI, the optimizer does not fabricate a ProjectRestrict node. This patch adds logic to the hash table node so it pushes its search restriction into the VTI just as the ProjectRestrict node does. 2) It also turns out that the VTI node has enough information to compute the projection by itself. Made the computation of the projection more robust by having the VTI compute the projection itself in cases where the parent did not request this service. Touches the following files: M java/engine/org/apache/derby/impl/sql/compile/Predicate.java Miscellaneous cleanup of the node-printing logic. M java/engine/org/apache/derby/impl/sql/compile/HashTableNode.java Logic supporting (1). M java/engine/org/apache/derby/impl/sql/compile/FromVTI.java Logic supporting (2). M java/engine/org/apache/derby/impl/sql/compile/ProjectRestrictNode.java Moved up the logic which pushes the VTI projection/restriction so that it is parallel to what is done in HashTableNode. M java/testing/org/apache/derbyTesting/functionTests/tests/lang/IntegerArrayVTI.java M java/testing/org/apache/derbyTesting/functionTests/tests/lang/RestrictedVTITest.java More tests.
          Hide
          Rick Hillegas added a comment -

          Attaching revised version of the patch which adds more tests: derby-4357-02-ad-passThrough.diff. Committed at subversion revision 831072.

          Show
          Rick Hillegas added a comment - Attaching revised version of the patch which adds more tests: derby-4357-02-ad-passThrough.diff. Committed at subversion revision 831072.
          Hide
          Rick Hillegas added a comment -

          Attaching derby-4357-02-ac-passThrough.diff. This is an initial implementation of this feature.

          Chris, you are welcome to take this for a test-drive and let me know about bugs and whether this solves your problem. Thanks.

          Here is the approach taken by this patch:

          1) The code generator decides whether to push projections and restrictions to the table function.

          2) The code generator relies on the fact that the optimizer inserts a ProjectRestrict node above the table function. As its name implies, the PR node holds the following:

          a) The projection of the table function. This is the list of all columns in the table function which are referenced in the query.

          b) The restriction of the table function. This is all of the WHERE clause fragments which can be evaluated using only columns from the table function.

          3) The code generator further relies on the fact that the optimizer has turned on the isQualifier flag on all predicates in the restriction which can be pushed into the table function. These are predicates of the form

          column OP constant

          where OP is one of the relational operators:

          < <= = > >= IS NULL IS NOT NULL

          4) At code generation time, the PR node checks to see whether its child is a FromVTI node for a RestrictedVTI. If so, the PR node tells its child to construct the column list and Restriction which will be passed to the RestrictedVTI at run-time.

          Note that if the child FromVTI doesn't understand the predicates which are passed to it, then the FromVTI computes a null Restriction. Over time, we can make this logic smarter and pass more complicated Restrictions to table functions.

          5) At run time, the VTIResultSet does the following:

          a) Clones the Restriction and plugs parameters into it in case any of the ColumnQualifiers refer to ? parameters.

          b) Stuffs the column list and Restriction into the RestrictedVTI by calling the initScan() method of the RestrictedVTI.

          Touches the following files:

          M java/engine/org/apache/derby/vti/Restriction.java

          Explicit support for IS NULL and IS NOT NULL predicates.

          M java/engine/org/apache/derby/impl/sql/compile/MethodCallNode.java

          Bind-time changes so that the declared return type of the Java method bound to a table function can be a subtype of ResultSet and not just a ResultSet.

          M java/engine/org/apache/derby/impl/sql/compile/ProjectRestrictNode.java

          Code-generation-time change: step (4) described above.

          M java/engine/org/apache/derby/impl/sql/compile/FromVTI.java

          Mostly code-generation-time changes to support step (4) above.

          M java/engine/org/apache/derby/impl/sql/execute/GenericResultSetFactory.java
          M java/engine/org/apache/derby/iapi/sql/execute/ResultSetFactory.java
          M java/engine/org/apache/derby/impl/sql/execute/VTIResultSet.java

          Run-time changes to support step (5) above.

          M tools/javadoc/publishedapi.ant

          Wire the new public api into the published javadoc.

          A java/testing/org/apache/derbyTesting/functionTests/tests/lang/IntegerArrayVTI.java
          M java/testing/org/apache/derbyTesting/functionTests/tests/lang/_Suite.java
          A java/testing/org/apache/derbyTesting/functionTests/tests/lang/RestrictedVTITest.java

          A couple initial tests to verify the soundness of this approach. Follow-on patches will supply more tests.

          Show
          Rick Hillegas added a comment - Attaching derby-4357-02-ac-passThrough.diff. This is an initial implementation of this feature. Chris, you are welcome to take this for a test-drive and let me know about bugs and whether this solves your problem. Thanks. Here is the approach taken by this patch: 1) The code generator decides whether to push projections and restrictions to the table function. 2) The code generator relies on the fact that the optimizer inserts a ProjectRestrict node above the table function. As its name implies, the PR node holds the following: a) The projection of the table function. This is the list of all columns in the table function which are referenced in the query. b) The restriction of the table function. This is all of the WHERE clause fragments which can be evaluated using only columns from the table function. 3) The code generator further relies on the fact that the optimizer has turned on the isQualifier flag on all predicates in the restriction which can be pushed into the table function. These are predicates of the form column OP constant where OP is one of the relational operators: < <= = > >= IS NULL IS NOT NULL 4) At code generation time, the PR node checks to see whether its child is a FromVTI node for a RestrictedVTI. If so, the PR node tells its child to construct the column list and Restriction which will be passed to the RestrictedVTI at run-time. Note that if the child FromVTI doesn't understand the predicates which are passed to it, then the FromVTI computes a null Restriction. Over time, we can make this logic smarter and pass more complicated Restrictions to table functions. 5) At run time, the VTIResultSet does the following: a) Clones the Restriction and plugs parameters into it in case any of the ColumnQualifiers refer to ? parameters. b) Stuffs the column list and Restriction into the RestrictedVTI by calling the initScan() method of the RestrictedVTI. Touches the following files: M java/engine/org/apache/derby/vti/Restriction.java Explicit support for IS NULL and IS NOT NULL predicates. M java/engine/org/apache/derby/impl/sql/compile/MethodCallNode.java Bind-time changes so that the declared return type of the Java method bound to a table function can be a subtype of ResultSet and not just a ResultSet. M java/engine/org/apache/derby/impl/sql/compile/ProjectRestrictNode.java Code-generation-time change: step (4) described above. M java/engine/org/apache/derby/impl/sql/compile/FromVTI.java Mostly code-generation-time changes to support step (4) above. M java/engine/org/apache/derby/impl/sql/execute/GenericResultSetFactory.java M java/engine/org/apache/derby/iapi/sql/execute/ResultSetFactory.java M java/engine/org/apache/derby/impl/sql/execute/VTIResultSet.java Run-time changes to support step (5) above. M tools/javadoc/publishedapi.ant Wire the new public api into the published javadoc. A java/testing/org/apache/derbyTesting/functionTests/tests/lang/IntegerArrayVTI.java M java/testing/org/apache/derbyTesting/functionTests/tests/lang/_Suite.java A java/testing/org/apache/derbyTesting/functionTests/tests/lang/RestrictedVTITest.java A couple initial tests to verify the soundness of this approach. Follow-on patches will supply more tests.
          Hide
          Rick Hillegas added a comment -

          Attaching version 6 of the functional spec. This incorporates the following changes:

          Small changes to Restriction class based on initial implementation: 1) removed confusing nullEqualsNull field of ColumnQualifier and added new operators "IS NULL" and "IS NOT NULL", 2) made Restrictions Serializable so that they can be stored in PreparedStatements.

          Show
          Rick Hillegas added a comment - Attaching version 6 of the functional spec. This incorporates the following changes: Small changes to Restriction class based on initial implementation: 1) removed confusing nullEqualsNull field of ColumnQualifier and added new operators "IS NULL" and "IS NOT NULL", 2) made Restrictions Serializable so that they can be stored in PreparedStatements.
          Hide
          Rick Hillegas added a comment -

          Thanks, Sylvain. I expect that I will be attaching a patch for this feature later today.

          Show
          Rick Hillegas added a comment - Thanks, Sylvain. I expect that I will be attaching a patch for this feature later today.
          Hide
          Sylvain Leroux added a comment -

          Hi,

          I reviewed the comments of this page and studied the specs.
          I think, with some guidance, I could cope with this issue.

          Sylvain

          Show
          Sylvain Leroux added a comment - Hi, I reviewed the comments of this page and studied the specs. I think, with some guidance, I could cope with this issue. Sylvain
          Hide
          Rick Hillegas added a comment -

          Committed public api derby-4357-01-aa-publicAPI.diff at subversion revision 829078.

          Show
          Rick Hillegas added a comment - Committed public api derby-4357-01-aa-publicAPI.diff at subversion revision 829078.
          Hide
          Rick Hillegas added a comment -

          Attaching derby-4357-01-aa-publicAPI.diff. This is just the public API defined in the spec. I thought I would snapshot this in JIRA. Introduces the following classes:

          A java/engine/org/apache/derby/vti/Restriction.java
          A java/engine/org/apache/derby/vti/RestrictedVTI.java

          Show
          Rick Hillegas added a comment - Attaching derby-4357-01-aa-publicAPI.diff. This is just the public API defined in the spec. I thought I would snapshot this in JIRA. Introduces the following classes: A java/engine/org/apache/derby/vti/Restriction.java A java/engine/org/apache/derby/vti/RestrictedVTI.java
          Hide
          Rick Hillegas added a comment -

          Thanks for the feedback, Chris. You're right, the ddl example is confusing. I will fix that in the next rev of the spec. I'm hoping to get to this feature during 10.6 development, but there are some other items ahead of it in the queue. Thanks.

          Show
          Rick Hillegas added a comment - Thanks for the feedback, Chris. You're right, the ddl example is confusing. I will fix that in the next rev of the spec. I'm hoping to get to this feature during 10.6 development, but there are some other items ahead of it in the queue. Thanks.
          Hide
          Chris Goodacre added a comment -

          Rick, the spec looks good. I was showing it to a colleague not familiar with TableFunctions and he was confused slightly by the example of declaring the table function:

          create function foreignDatabaseEmployeeTable()
          returns table
          (
          id int,
          birthday date,
          taxPayerID varchar( 50 ),
          firstName varchar( 50 ),
          lastName varchar( 50 )
          )
          language java
          parameter style DERBY_JDBC_RESULT_SET
          no sql
          external name 'com.acme.portal.foreignDatabaseEmployeeTable'

          The external name should be a FQCN + method name, right? It looks like either you omitted the classname, or 'portal' is your classname and the lowercase is confusing.

          Otherwise the spec looks solid. Any guestimate on whether our 1 vote will be enough to get this selected for implementation? Do I need to sign up all my dead relatives with JIRA accounts (Chicago-style) to get this over the hump? :-D

          Show
          Chris Goodacre added a comment - Rick, the spec looks good. I was showing it to a colleague not familiar with TableFunctions and he was confused slightly by the example of declaring the table function: create function foreignDatabaseEmployeeTable() returns table ( id int, birthday date, taxPayerID varchar( 50 ), firstName varchar( 50 ), lastName varchar( 50 ) ) language java parameter style DERBY_JDBC_RESULT_SET no sql external name 'com.acme.portal.foreignDatabaseEmployeeTable' The external name should be a FQCN + method name, right? It looks like either you omitted the classname, or 'portal' is your classname and the lowercase is confusing. Otherwise the spec looks solid. Any guestimate on whether our 1 vote will be enough to get this selected for implementation? Do I need to sign up all my dead relatives with JIRA accounts (Chicago-style) to get this over the hump? :-D
          Hide
          Rick Hillegas added a comment -

          Attaching 4th version of the spec, incorporating feedback from Kim.

          Show
          Rick Hillegas added a comment - Attaching 4th version of the spec, incorporating feedback from Kim.
          Hide
          Rick Hillegas added a comment -

          Attaching a new rev of the spec, incorporating Knut's feedback.

          Show
          Rick Hillegas added a comment - Attaching a new rev of the spec, incorporating Knut's feedback.
          Hide
          Knut Anders Hatlen added a comment -

          Thanks for the updated spec, Rick. It looks good to me. Some small nits:

          1) The example in the Behavior section should be updated and use the initScan() method.

          2) In the javadoc for initScan() it would be good to clarify whether the "columns which need to be materialized" means the columns referenced in the select list or the union of the columns in the select list and the columns in the restrictions. That is, which column names does the array contain in the following example: select lastName from table( foreignEmployeeTable() ) s where s id = 5

          Show
          Knut Anders Hatlen added a comment - Thanks for the updated spec, Rick. It looks good to me. Some small nits: 1) The example in the Behavior section should be updated and use the initScan() method. 2) In the javadoc for initScan() it would be good to clarify whether the "columns which need to be materialized" means the columns referenced in the select list or the union of the columns in the select list and the columns in the restrictions. That is, which column names does the array contain in the following example: select lastName from table( foreignEmployeeTable() ) s where s id = 5
          Hide
          Rick Hillegas added a comment -

          Attaching second rev of spec, incorporating feedback from Knut and Chris.

          Show
          Rick Hillegas added a comment - Attaching second rev of spec, incorporating feedback from Knut and Chris.
          Hide
          Rick Hillegas added a comment - - edited

          Thanks for the additional comments, Knut and Chris. Some responses follow:

          1) I do see the value of enforcing the restriction outside the table function even if the restriction is pushed into the table function. I'll change the spec to say that's what we'll do. However, heads up: I may revert back to the original contract if this turns out to be difficult to implement. In that case, we can file a separate JIRA for this relaxed contract--it would be backward-compatible with the original, more exacting contract.

          2) In order to remove ambiguity about the order in which the RestrictedVTI methods are called, I will combine them into a single initScan() method.

          3) I will also clarify that initScan() is called once per scan and is called before any other ResultSet methods are called.

          4) It's certainly possible to build on this work and turn an IN list into a series of probes of the table function. This should be possible for table functions which satisfiy the following conditions:

          a) their returned ResultSets implement RestrictedVTI

          b) they are declared in classes which implement org.apache.derby.vti VTICosting, provided that VTICosting.supportsMultipleInstantiations() returns true.

          This, however, would be another JIRA. For the record, I don't think that you can get around this problem by rewriting your IN list as a series of ORs. I think that the Derby optimizer will outfox you and transform that series of ORs back into an IN list which won't operate efficiently on a table function, even after this JIRA is done. See http://db.apache.org/derby/docs/10.5/tuning/tuning-single.html#rtuntransform590 I will note this in the spec.

          5) I think that a related improvement would be the ability to optimize joins with table functions which satisfy the conditions in (4). That's another JIRA, too.

          6) I think that the LIKE and BETWEEN optimizations should play well with this feature, but we'll have to prove that when this work is done. Those optimizations are described here: http://db.apache.org/derby/docs/10.5/tuning/tuning-single.html#rtuntransform139

          7) I don't see much value in supporting the <> operator. However, if we added this operator to the list of simple comparisons which can be pushed into the Derby store, then we could piggyback the corresponding table function work on top of that effort.

          What other operations would you like to push into table functions?

          Thanks,
          -Rick

          Show
          Rick Hillegas added a comment - - edited Thanks for the additional comments, Knut and Chris. Some responses follow: 1) I do see the value of enforcing the restriction outside the table function even if the restriction is pushed into the table function. I'll change the spec to say that's what we'll do. However, heads up: I may revert back to the original contract if this turns out to be difficult to implement. In that case, we can file a separate JIRA for this relaxed contract--it would be backward-compatible with the original, more exacting contract. 2) In order to remove ambiguity about the order in which the RestrictedVTI methods are called, I will combine them into a single initScan() method. 3) I will also clarify that initScan() is called once per scan and is called before any other ResultSet methods are called. 4) It's certainly possible to build on this work and turn an IN list into a series of probes of the table function. This should be possible for table functions which satisfiy the following conditions: a) their returned ResultSets implement RestrictedVTI b) they are declared in classes which implement org.apache.derby.vti VTICosting, provided that VTICosting.supportsMultipleInstantiations() returns true. This, however, would be another JIRA. For the record, I don't think that you can get around this problem by rewriting your IN list as a series of ORs. I think that the Derby optimizer will outfox you and transform that series of ORs back into an IN list which won't operate efficiently on a table function, even after this JIRA is done. See http://db.apache.org/derby/docs/10.5/tuning/tuning-single.html#rtuntransform590 I will note this in the spec. 5) I think that a related improvement would be the ability to optimize joins with table functions which satisfy the conditions in (4). That's another JIRA, too. 6) I think that the LIKE and BETWEEN optimizations should play well with this feature, but we'll have to prove that when this work is done. Those optimizations are described here: http://db.apache.org/derby/docs/10.5/tuning/tuning-single.html#rtuntransform139 7) I don't see much value in supporting the <> operator. However, if we added this operator to the list of simple comparisons which can be pushed into the Derby store, then we could piggyback the corresponding table function work on top of that effort. What other operations would you like to push into table functions? Thanks, -Rick
          Hide
          Chris Goodacre added a comment -

          Related to Knut's comment (and your reply) about <>, what would be the path to add that operator and other operators that fall outside of the Orderable interface (e.g. IN and NOT IN). Would this mechanism ever support a query like this:

          create function foreignEmployeeTable()
          returns table
          (
          id int,
          birthDay date,
          status character(1),
          firstName varchar( 100 ),
          lastName varchar( 100 )
          )

          select lastName from table( foreignEmployeeTable() ) s where status in ('A','L','S')

          Show
          Chris Goodacre added a comment - Related to Knut's comment (and your reply) about <>, what would be the path to add that operator and other operators that fall outside of the Orderable interface (e.g. IN and NOT IN). Would this mechanism ever support a query like this: create function foreignEmployeeTable() returns table ( id int, birthDay date, status character(1), firstName varchar( 100 ), lastName varchar( 100 ) ) select lastName from table( foreignEmployeeTable() ) s where status in ('A','L','S')
          Hide
          Chris Goodacre added a comment -

          I think it would be helpful to be a little clearer about the semantics and invocation of the the RestrictedVTI interface methods. I'd like to see something like this (if this is actually correct):

          There is no implicit or explicit order of invocation for setMaterializedColumnNames() and setRestriction().

          The setMaterializedColumnNames(...) method is guaranteed to be invoked only 1 time per TableFunction execution. This invocation must occur prior to any invocation of any of the following methods:
          getXXXXXX()
          next()
          isFirst()
          first()
          isLast()
          last()

          The setRestriction(....) method is guaranteed to be invoked only 1 time per TableFunction execution. NOTE: the Restriction class is a binary tree representing all restrictions in the WHERE clause of the underlying query in which a column/field from the TableFunction is compared against a constant value. Specifically, setRestriction is not invoked once for each restriction. This invocation must occur prior to any invocation of the methods enumated in setMaterializedColumnNames().

          Btw- I realize that the API doc for Restriction also specifically states that it is a binary tree, I just think it's helpful in the interface doc to head off any misunderstanding.

          Show
          Chris Goodacre added a comment - I think it would be helpful to be a little clearer about the semantics and invocation of the the RestrictedVTI interface methods. I'd like to see something like this (if this is actually correct): There is no implicit or explicit order of invocation for setMaterializedColumnNames() and setRestriction(). The setMaterializedColumnNames(...) method is guaranteed to be invoked only 1 time per TableFunction execution. This invocation must occur prior to any invocation of any of the following methods: getXXXXXX() next() isFirst() first() isLast() last() The setRestriction(....) method is guaranteed to be invoked only 1 time per TableFunction execution. NOTE: the Restriction class is a binary tree representing all restrictions in the WHERE clause of the underlying query in which a column/field from the TableFunction is compared against a constant value. Specifically, setRestriction is not invoked once for each restriction. This invocation must occur prior to any invocation of the methods enumated in setMaterializedColumnNames(). Btw- I realize that the API doc for Restriction also specifically states that it is a binary tree, I just think it's helpful in the interface doc to head off any misunderstanding.
          Hide
          Knut Anders Hatlen added a comment -

          Thanks for the clarifications, Rick.

          I agree that if we add a mechanism to let the table function return the results in a specific order, it would be pointless to do an extra sort in Derby. I would argue though that there is a difference between pushing restrictions and ensuring ordering.

          With the ordering, the table function itself declares that it can return the rows ordered by column X, so it's reasonable to expect that it's true that they will in fact be ordered by X. Also, Derby won't ever ask it to order by column Y. So for column X, where you have an easy way to do the ordering inside the table function, you can use the power of Java to order the rows, whereas for column Y you rely on the power of SQL, so you get to combine the strengths of both Java and SQL.

          With the restrictions, the table function has no way to say that it only accepts restrictions on a certain column, or only certain kinds of operators. So if the table function is required to enforce all restrictions, you can no longer use the power of SQL for those restrictions where you have no advantages of implementing them in Java. (Also, though perhaps not likely to ever be a problem, new operators cannot be added later without breaking the existing restricted VTIs, as old table functions wouldn't know how to handle them, but they are not allowed to ignore them.)

          As an example, say that we have a table function F(K,V) which is simply wrapping a hash table. A restriction such as K='abc' would be very helpful to push down, as it could be used as an argument to Hashtable.get() and reduce the number of rows returned to one (or zero). However, restrictions on K with operators such as <,>,<=,>=, or any restriction on V, would require that the entire Hashtable was scanned. One would also have to implement evaluation of each operator on each column, and one would have to handle arbitrarily nested AND/OR operators. Whereas this is indeed possible to do, it sounds like wasted work to reimplement what's already implemented in the SQL engine for no added benefit.

          If the restrictions are only seen as a hint, this table function would need only a couple of lines of code to extract the equals restrictions on K. The full table scan needed for more complex restrictions could be left to the SQL engine, which should be able to perform the scan just as efficiently as the user's hand-coded scan. Also, the hand-coded scan will probably have undergone less testing and will be more error-prone than the code in the SQL engine.

          Show
          Knut Anders Hatlen added a comment - Thanks for the clarifications, Rick. I agree that if we add a mechanism to let the table function return the results in a specific order, it would be pointless to do an extra sort in Derby. I would argue though that there is a difference between pushing restrictions and ensuring ordering. With the ordering, the table function itself declares that it can return the rows ordered by column X, so it's reasonable to expect that it's true that they will in fact be ordered by X. Also, Derby won't ever ask it to order by column Y. So for column X, where you have an easy way to do the ordering inside the table function, you can use the power of Java to order the rows, whereas for column Y you rely on the power of SQL, so you get to combine the strengths of both Java and SQL. With the restrictions, the table function has no way to say that it only accepts restrictions on a certain column, or only certain kinds of operators. So if the table function is required to enforce all restrictions, you can no longer use the power of SQL for those restrictions where you have no advantages of implementing them in Java. (Also, though perhaps not likely to ever be a problem, new operators cannot be added later without breaking the existing restricted VTIs, as old table functions wouldn't know how to handle them, but they are not allowed to ignore them.) As an example, say that we have a table function F(K,V) which is simply wrapping a hash table. A restriction such as K='abc' would be very helpful to push down, as it could be used as an argument to Hashtable.get() and reduce the number of rows returned to one (or zero). However, restrictions on K with operators such as <,>,<=,>=, or any restriction on V, would require that the entire Hashtable was scanned. One would also have to implement evaluation of each operator on each column, and one would have to handle arbitrarily nested AND/OR operators. Whereas this is indeed possible to do, it sounds like wasted work to reimplement what's already implemented in the SQL engine for no added benefit. If the restrictions are only seen as a hint, this table function would need only a couple of lines of code to extract the equals restrictions on K. The full table scan needed for more complex restrictions could be left to the SQL engine, which should be able to perform the scan just as efficiently as the user's hand-coded scan. Also, the hand-coded scan will probably have undergone less testing and will be more error-prone than the code in the SQL engine.
          Hide
          Chris Goodacre added a comment -

          Thanks for the pointer, Rick. I'll take a look at that example.

          Show
          Chris Goodacre added a comment - Thanks for the pointer, Rick. I'll take a look at that example.
          Hide
          Rick Hillegas added a comment -

          Hi Chris,

          It sounds to me like the functionality you just described belongs in another JIRA. You might want to take a look at the demo under java/demo/vtis. That shows you how to define a family of table functions using annotations. In particular, take a look at java/demo/vtis/org/apache/derbyDemo/vtis/example/WorldDBSnapshot. That class uses annotations to define a series of parameterized snapshots of tables from a foreign database.

          Show
          Rick Hillegas added a comment - Hi Chris, It sounds to me like the functionality you just described belongs in another JIRA. You might want to take a look at the demo under java/demo/vtis. That shows you how to define a family of table functions using annotations. In particular, take a look at java/demo/vtis/org/apache/derbyDemo/vtis/example/WorldDBSnapshot. That class uses annotations to define a series of parameterized snapshots of tables from a foreign database.
          Hide
          Chris Goodacre added a comment -

          Regarding the functional spec, its emphasis is on performance tuning, which follows our original discussion thread. Is it worth discussing other information that might be informative to the TableFunction implementation? For instance: say I set out to use Derby as a relational cache for a separate system. I write a TableFunction that checks the freshness of the data in a table (inside a Derby database) that mirrors the external system. I'll call that local Derby table the "cache table". If the data is stale, I grab the data from the external system and push it into the cache table.

          Question: Is there a way to write this function generically? (I don't think so, since the TableFunction doesn't know which table the query was executed against). If I have each customer in their own database or schema, how do I know which database or schema I should check the cache table in?

          Admittedly, these are not TableFunction performance optimizations. Should they be a separate enhancement request or do we deal with all tablefunction-informing APIs at once?

          Show
          Chris Goodacre added a comment - Regarding the functional spec, its emphasis is on performance tuning, which follows our original discussion thread. Is it worth discussing other information that might be informative to the TableFunction implementation? For instance: say I set out to use Derby as a relational cache for a separate system. I write a TableFunction that checks the freshness of the data in a table (inside a Derby database) that mirrors the external system. I'll call that local Derby table the "cache table". If the data is stale, I grab the data from the external system and push it into the cache table. Question: Is there a way to write this function generically? (I don't think so, since the TableFunction doesn't know which table the query was executed against). If I have each customer in their own database or schema, how do I know which database or schema I should check the cache table in? Admittedly, these are not TableFunction performance optimizations. Should they be a separate enhancement request or do we deal with all tablefunction-informing APIs at once?
          Hide
          Chris Goodacre added a comment -

          >3) The array passed to setMaterializedColumnNames() contains somewhat redundant information (both names and positions of the columns to >materialize). Would an array of booleans suffice? Or could the extra information be used for something?

          >>I agree that the information is redundant and my original suggestion was to use a bitmap. However, Chris reported that this would be >>cumbersome to use in the real world. I don't think that the redundancy is harmful. In the meantime, I have warmed up to the redundancy. That's >>because I can see that it will make it very easy to write a generic table function that wraps a SELECT against a foreign table and allows you to >>push projections and restrictions into the foreign query.

          Specifically, I noted that a BitMap (or an array of [Bb]ooleans) would force my code (I believe) to know the order of the columns as they were defined in the function. Currently, I don't have that dependency, and if I could get away without it, that was preferable to me.

          Show
          Chris Goodacre added a comment - >3) The array passed to setMaterializedColumnNames() contains somewhat redundant information (both names and positions of the columns to >materialize). Would an array of booleans suffice? Or could the extra information be used for something? >>I agree that the information is redundant and my original suggestion was to use a bitmap. However, Chris reported that this would be >>cumbersome to use in the real world. I don't think that the redundancy is harmful. In the meantime, I have warmed up to the redundancy. That's >>because I can see that it will make it very easy to write a generic table function that wraps a SELECT against a foreign table and allows you to >>push projections and restrictions into the foreign query. Specifically, I noted that a BitMap (or an array of [Bb] ooleans) would force my code (I believe) to know the order of the columns as they were defined in the function. Currently, I don't have that dependency, and if I could get away without it, that was preferable to me.
          Hide
          Rick Hillegas added a comment -

          Thanks for the quick feedback, Knut.

          >1) It wasn't entirely clear to me how the _nullEqualsNull field in Restriction.ColumnQualifier is supposed to be used. Is the user supposed to specify somehow that this VTI treats NULLs in a non-standard way? If so, how (and why)? Or is it meant to implement IS (NOT) NULL restrictions? Or something else?

          It is meant to implement IS NULL. I will clarify this.

          >2) The operator '<>' is not defined. Is that intentional?

          Yes. The idea is to model the Qualifier functionality which the optimizer already understands. The comparisons are the same ones found in org.apache.derby.iapi.types.Orderable.

          >3) The array passed to setMaterializedColumnNames() contains somewhat redundant information (both names and positions of the columns to materialize). Would an array of booleans suffice? Or could the extra information be used for something?

          I agree that the information is redundant and my original suggestion was to use a bitmap. However, Chris reported that this would be cumbersome to use in the real world. I don't think that the redundancy is harmful. In the meantime, I have warmed up to the redundancy. That's because I can see that it will make it very easy to write a generic table function that wraps a SELECT against a foreign table and allows you to push projections and restrictions into the foreign query.

          4) I cannot find that the spec says what happens if the restricted VTI returns non-qualifying rows. Should the restrictions be reevaluated outside the VTI? (I think I would prefer that the restrictions were only seen as hints, so that the VTI doesn't have to check all of them. Then, if there's a restriction on a column that doesn't really help the VTI retrieving the rows more efficiently, it could just disregard that restriction and let the SQL do the filtering instead.)

          The intent is that the table function has signed up for a contract to filter rows. If the table function doesn't fulfill its contract, then the table function has a bug. I will clarify this in the spec. I do see your point that redundant enforcement of the qualification in Derby could be useful to some users. I'm prepared to revisit this part of the spec during implementation if it turns out that the query processor already redundantly enforces restrictions even after pushing them into the store layer.

          The reason that I prefer to regard the filtering as a contract is that I think it will make it easier to reason about these smart table functions when we make them even smarter: if a table function claims that it can perform a piece of processing, then Derby should be able to rely on that claim. I'm thinking that at some point we will want to add a similar mechanism so that a table function can declare that it can return rows in certain sort orders. I think that if Derby asks a table function to sort the rows, then Derby doesn't want to perform a redundant sort outside the table function.

          Thanks!

          Show
          Rick Hillegas added a comment - Thanks for the quick feedback, Knut. >1) It wasn't entirely clear to me how the _nullEqualsNull field in Restriction.ColumnQualifier is supposed to be used. Is the user supposed to specify somehow that this VTI treats NULLs in a non-standard way? If so, how (and why)? Or is it meant to implement IS (NOT) NULL restrictions? Or something else? It is meant to implement IS NULL. I will clarify this. >2) The operator '<>' is not defined. Is that intentional? Yes. The idea is to model the Qualifier functionality which the optimizer already understands. The comparisons are the same ones found in org.apache.derby.iapi.types.Orderable. >3) The array passed to setMaterializedColumnNames() contains somewhat redundant information (both names and positions of the columns to materialize). Would an array of booleans suffice? Or could the extra information be used for something? I agree that the information is redundant and my original suggestion was to use a bitmap. However, Chris reported that this would be cumbersome to use in the real world. I don't think that the redundancy is harmful. In the meantime, I have warmed up to the redundancy. That's because I can see that it will make it very easy to write a generic table function that wraps a SELECT against a foreign table and allows you to push projections and restrictions into the foreign query. 4) I cannot find that the spec says what happens if the restricted VTI returns non-qualifying rows. Should the restrictions be reevaluated outside the VTI? (I think I would prefer that the restrictions were only seen as hints, so that the VTI doesn't have to check all of them. Then, if there's a restriction on a column that doesn't really help the VTI retrieving the rows more efficiently, it could just disregard that restriction and let the SQL do the filtering instead.) The intent is that the table function has signed up for a contract to filter rows. If the table function doesn't fulfill its contract, then the table function has a bug. I will clarify this in the spec. I do see your point that redundant enforcement of the qualification in Derby could be useful to some users. I'm prepared to revisit this part of the spec during implementation if it turns out that the query processor already redundantly enforces restrictions even after pushing them into the store layer. The reason that I prefer to regard the filtering as a contract is that I think it will make it easier to reason about these smart table functions when we make them even smarter: if a table function claims that it can perform a piece of processing, then Derby should be able to rely on that claim. I'm thinking that at some point we will want to add a similar mechanism so that a table function can declare that it can return rows in certain sort orders. I think that if Derby asks a table function to sort the rows, then Derby doesn't want to perform a redundant sort outside the table function. Thanks!
          Hide
          Knut Anders Hatlen added a comment -

          Thanks for writing the functional spec, Rick. It looks good and fairly complete. I have a couple of questions, though:

          1) It wasn't entirely clear to me how the _nullEqualsNull field in Restriction.ColumnQualifier is supposed to be used. Is the user supposed to specify somehow that this VTI treats NULLs in a non-standard way? If so, how (and why)? Or is it meant to implement IS (NOT) NULL restrictions? Or something else?

          2) The operator '<>' is not defined. Is that intentional?

          3) The array passed to setMaterializedColumnNames() contains somewhat redundant information (both names and positions of the columns to materialize). Would an array of booleans suffice? Or could the extra information be used for something?

          4) I cannot find that the spec says what happens if the restricted VTI returns non-qualifying rows. Should the restrictions be reevaluated outside the VTI? (I think I would prefer that the restrictions were only seen as hints, so that the VTI doesn't have to check all of them. Then, if there's a restriction on a column that doesn't really help the VTI retrieving the rows more efficiently, it could just disregard that restriction and let the SQL do the filtering instead.)

          Show
          Knut Anders Hatlen added a comment - Thanks for writing the functional spec, Rick. It looks good and fairly complete. I have a couple of questions, though: 1) It wasn't entirely clear to me how the _nullEqualsNull field in Restriction.ColumnQualifier is supposed to be used. Is the user supposed to specify somehow that this VTI treats NULLs in a non-standard way? If so, how (and why)? Or is it meant to implement IS (NOT) NULL restrictions? Or something else? 2) The operator '<>' is not defined. Is that intentional? 3) The array passed to setMaterializedColumnNames() contains somewhat redundant information (both names and positions of the columns to materialize). Would an array of booleans suffice? Or could the extra information be used for something? 4) I cannot find that the spec says what happens if the restricted VTI returns non-qualifying rows. Should the restrictions be reevaluated outside the VTI? (I think I would prefer that the restrictions were only seen as hints, so that the VTI doesn't have to check all of them. Then, if there's a restriction on a column that doesn't really help the VTI retrieving the rows more efficiently, it could just disregard that restriction and let the SQL do the filtering instead.)
          Hide
          Rick Hillegas added a comment -

          Attaching a first rev of a functional spec for this feature. The spec describes a new public Derby interface, RestrictedVTI. Table functions which implement this interface can be told which columns they must materialize. In addition, table functions which implement this interface can be given simple bounds to apply to those columns.

          Show
          Rick Hillegas added a comment - Attaching a first rev of a functional spec for this feature. The spec describes a new public Derby interface, RestrictedVTI. Table functions which implement this interface can be told which columns they must materialize. In addition, table functions which implement this interface can be given simple bounds to apply to those columns.
          Hide
          Rick Hillegas added a comment -

          More information on this issue can be found in the following email thread: http://www.nabble.com/Question-about-TableFunctions-in-Derby-td24491389.html#a24491389

          Show
          Rick Hillegas added a comment - More information on this issue can be found in the following email thread: http://www.nabble.com/Question-about-TableFunctions-in-Derby-td24491389.html#a24491389

            People

            • Assignee:
              Rick Hillegas
              Reporter:
              Chris Goodacre
            • Votes:
              1 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development