Derby
  1. Derby
  2. DERBY-455

Add support for creating index on expressions

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 10.0.2.1
    • Fix Version/s: None
    • Component/s: SQL
    • Environment:
      Solaris
    • Urgency:
      Low

      Description

      As per Derby doc, indexes can be created only on simple column names.

      Creation of index on expressions will boost performance of queries with those expressions in where clause.

        Issue Links

          Activity

          Hide
          Rick Hillegas added a comment -

          Link the function-index issue to the generated-column issue since they are closely related.

          Show
          Rick Hillegas added a comment - Link the function-index issue to the generated-column issue since they are closely related.
          Hide
          Rick Hillegas added a comment -

          See also enhancement request 481: Generated Columns. These would be even easier to implement and would satisfy the performance requirement.

          Show
          Rick Hillegas added a comment - See also enhancement request 481: Generated Columns. These would be even easier to implement and would satisfy the performance requirement.
          Hide
          Rick Hillegas added a comment -

          Thanks, Mike, for reminding me that in Derby the Language maintains the index rows as well as the base row. This seems pretty clear from looking at InsertConstantAction and friends. Your solution (3) then looks like the best approach to me. I suspect it's less Language work than (1) because the Language doesn't have to kludge around hidden columns (I think the details here would turn into a surprising amount of work). Probably it's less Store work too since we don't have to implement DROP COLUMN.

          Show
          Rick Hillegas added a comment - Thanks, Mike, for reminding me that in Derby the Language maintains the index rows as well as the base row. This seems pretty clear from looking at InsertConstantAction and friends. Your solution (3) then looks like the best approach to me. I suspect it's less Language work than (1) because the Language doesn't have to kludge around hidden columns (I think the details here would turn into a surprising amount of work). Probably it's less Store work too since we don't have to implement DROP COLUMN.
          Hide
          Mike Matrigali added a comment -

          I think option 1 is easiest, I also believe there is another option which would require slightly less storage but would
          require more work. I think I would lean toward option 1, but seemed reasonable to add this option to the
          discussion. I'll call it option 3:
          3) Index only columns. In this approach, declaring an expression as an indexed column would cause the
          Language to create an an index on the resulting value of the function, but would not change the base
          table at all. As in option 1 at all inserts and at updates which affect columns which are parameters to the
          function appropriate index modifications would have to be made. As in option 1 the optimizer would have
          to know the mapping between the expression and the index. Also Language would have to understand that
          result value of the expression only exists in the index, and not in the base table.

          With respect to option 1 I think that "invisible" columns are best handled by the language layer, in the existing
          catalogs which describe columns in a table. There is not really much room in store to store metadata about
          the columns, and no existing interfaces to return this metadata back to the caller.
          Language at compile time simply uses the existing catalog information to compile a plan which returns
          the appropriate columns and uses the appropriate "hidden" columns for a select *.
          In this way, store just maintains the invisible columns as normal
          columns and no special support is necessary - drop column support is already available.

          To debug such a feature, it may be useful to add a way to query the hidden column.

          I am having a hard time understanding option 2, and it may be because of the assumption of the current
          derby base table/index architecture. Currently store does not know anything about the association of the
          index and the base table, I think this is good as it allows different types of more complex indexes in the
          future built on the basic btree building block. I think I am not seeing what actually gets stored in the btree in this case - I don't
          see the value of storing anything other than the result of the function in the btree, and then keying on that
          value. Is the suggestion here to have the insert into the btree take the argument columns and internally
          run the function rather than have the caller provide the result of the function?

          Once this type of feature is implemented I think the next kind of index which is interesting is one which has N
          index entries for every one base table row. I believe XML indexing would
          benefit from such an index.

          Show
          Mike Matrigali added a comment - I think option 1 is easiest, I also believe there is another option which would require slightly less storage but would require more work. I think I would lean toward option 1, but seemed reasonable to add this option to the discussion. I'll call it option 3: 3) Index only columns. In this approach, declaring an expression as an indexed column would cause the Language to create an an index on the resulting value of the function, but would not change the base table at all. As in option 1 at all inserts and at updates which affect columns which are parameters to the function appropriate index modifications would have to be made. As in option 1 the optimizer would have to know the mapping between the expression and the index. Also Language would have to understand that result value of the expression only exists in the index, and not in the base table. With respect to option 1 I think that "invisible" columns are best handled by the language layer, in the existing catalogs which describe columns in a table. There is not really much room in store to store metadata about the columns, and no existing interfaces to return this metadata back to the caller. Language at compile time simply uses the existing catalog information to compile a plan which returns the appropriate columns and uses the appropriate "hidden" columns for a select *. In this way, store just maintains the invisible columns as normal columns and no special support is necessary - drop column support is already available. To debug such a feature, it may be useful to add a way to query the hidden column. I am having a hard time understanding option 2, and it may be because of the assumption of the current derby base table/index architecture. Currently store does not know anything about the association of the index and the base table, I think this is good as it allows different types of more complex indexes in the future built on the basic btree building block. I think I am not seeing what actually gets stored in the btree in this case - I don't see the value of storing anything other than the result of the function in the btree, and then keying on that value. Is the suggestion here to have the insert into the btree take the argument columns and internally run the function rather than have the caller provide the result of the function? Once this type of feature is implemented I think the next kind of index which is interesting is one which has N index entries for every one base table row. I believe XML indexing would benefit from such an index.
          Hide
          Rick Hillegas added a comment -

          Here's a stab at formalizing this request a bit. First we start out with some definitions:

          linearDatatype - Any datatype on which you can build a Btree index

          stableOperator - A side-effect free SQL operator which always returns the same value when given the same operands. This includes the arithmetic, string, and boolean operators: +, -, *, /, <, >, =, CAST, ||.

          stableFunction - A side-effect-free function which always returns the same value when given the same parameters. Being side-effect-free means that a stableFunction may not alter the database environment. For instance, it may not issue queries or alter the connection in any way. A stableFunction should not perform any i/o. This includes network and disk i/o. A stableFunction is supposed to be small and self-contained. It's what the literature calls a "pure lambda function".

          indexableExpression - An expression, written in sql which satisfies the following conditions: 1) It may appear in the WHERE clause of a SELECT statement. 2) It contains no subqueries. 2) Its resolved datatype is a linearDatatype. 3) It is composed entirely of stableOperators, stableFunctions, constants, and columns from a single row from a single base table.

          indexedExpression - An indexableExpression on which an index has been built.

          This feature then lets users declare index columns to be indexableExpressions built up from columns in the base row. The feature also provides language support for marking indexedExpressions in queries as sargs (search arguments), for selecting the optimal join strategy given these sargs, and then evaluating the resultant query plan.

          The CREATE INDEX syntax would change as follows:

          CREATE [UNIQUE] INDEX index-Name
          ON table-Name ( indexableExpression [ ASC | DESC ]
          [ , indexableExpression [ ASC | DESC ]] * )

          Here are some sketches for how we might implement this feature. Hopefully, the Store experts will respond with better suggestions:

          1) Invisible Columns. In this approach, declaring an expression as an
          indexed column would cause the Language to create an invisible
          column in the table. At INSERT and UPDATE time, the Language would
          compute the value of this column as necessary. We would need some
          Store support for dropping these invisible columns when the
          corresponding index is dropped.

          2) Stored Expressions. In this approach, the stored index definition
          would include a compiled version of the expression
          which the Store would run to calculate the index column value. This
          approach probably creates some complexity at recovery and upgrade
          time. It also creates opportunities for Store corruption
          if users include mischievous functions in their
          indexableExpressions. Again, this would be particularly problematic
          at recovery time.

          Show
          Rick Hillegas added a comment - Here's a stab at formalizing this request a bit. First we start out with some definitions: linearDatatype - Any datatype on which you can build a Btree index stableOperator - A side-effect free SQL operator which always returns the same value when given the same operands. This includes the arithmetic, string, and boolean operators: +, -, *, /, <, >, =, CAST, ||. stableFunction - A side-effect-free function which always returns the same value when given the same parameters. Being side-effect-free means that a stableFunction may not alter the database environment. For instance, it may not issue queries or alter the connection in any way. A stableFunction should not perform any i/o. This includes network and disk i/o. A stableFunction is supposed to be small and self-contained. It's what the literature calls a "pure lambda function". indexableExpression - An expression, written in sql which satisfies the following conditions: 1) It may appear in the WHERE clause of a SELECT statement. 2) It contains no subqueries. 2) Its resolved datatype is a linearDatatype. 3) It is composed entirely of stableOperators, stableFunctions, constants, and columns from a single row from a single base table. indexedExpression - An indexableExpression on which an index has been built. This feature then lets users declare index columns to be indexableExpressions built up from columns in the base row. The feature also provides language support for marking indexedExpressions in queries as sargs (search arguments), for selecting the optimal join strategy given these sargs, and then evaluating the resultant query plan. The CREATE INDEX syntax would change as follows: CREATE [UNIQUE] INDEX index-Name ON table-Name ( indexableExpression [ ASC | DESC ] [ , indexableExpression [ ASC | DESC ]] * ) Here are some sketches for how we might implement this feature. Hopefully, the Store experts will respond with better suggestions: 1) Invisible Columns. In this approach, declaring an expression as an indexed column would cause the Language to create an invisible column in the table. At INSERT and UPDATE time, the Language would compute the value of this column as necessary. We would need some Store support for dropping these invisible columns when the corresponding index is dropped. 2) Stored Expressions. In this approach, the stored index definition would include a compiled version of the expression which the Store would run to calculate the index column value. This approach probably creates some complexity at recovery and upgrade time. It also creates opportunities for Store corruption if users include mischievous functions in their indexableExpressions. Again, this would be particularly problematic at recovery time.

            People

            • Assignee:
              Unassigned
              Reporter:
              simmi iyer
            • Votes:
              4 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:

                Development