Derby
  1. Derby
  2. DERBY-1773

insertRow() and updateRow() fail with syntax error when column has an alias

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Minor Minor
    • Resolution: Unresolved
    • Affects Version/s: 10.2.1.6
    • Fix Version/s: None
    • Component/s: JDBC
    • Urgency:
      Normal
    • Issue & fix info:
      Repro attached

      Description

      When the select query used in an updatable result set has column aliases, a syntax error is thrown when executing ResultSet.insertRow() and ResultSet.updateRow(). The problem is seen on embedded and client. Repro is attached.

      Exception in thread "main" ERROR 42X14: 'A1' is not a column in table or VTI 'APP.T'.
      at org.apache.derby.iapi.error.StandardException.newException(StandardException.java:316)
      at org.apache.derby.impl.sql.compile.ResultColumn.bindResultColumnByName(ResultColumn.java:677)
      at org.apache.derby.impl.sql.compile.ResultColumnList.bindResultColumnsByName(ResultColumnList.java:682)
      at org.apache.derby.impl.sql.compile.ResultSetNode.bindResultColumns(ResultSetNode.java:683)
      at org.apache.derby.impl.sql.compile.SelectNode.bindResultColumns(SelectNode.java:742)
      at org.apache.derby.impl.sql.compile.UpdateNode.bind(UpdateNode.java:349)
      at org.apache.derby.impl.sql.GenericStatement.prepMinion(GenericStatement.java:345)
      at org.apache.derby.impl.sql.GenericStatement.prepare(GenericStatement.java:111)
      at org.apache.derby.impl.sql.conn.GenericLanguageConnectionContext.prepareInternalStatement(GenericLanguageConnectionContext.java:723)
      at org.apache.derby.impl.jdbc.EmbedResultSet.updateRow(EmbedResultSet.java:3734)
      at Alias.main(Alias.java:15)

      1. updatedToHeadMarch2010.diff
        7 kB
        Bryan Pendleton
      2. NoUpdatesToAliasedColumnsWithTest.diff
        7 kB
        Bryan Pendleton
      3. ASF.LICENSE.NOT.GRANTED--Alias.java
        0.6 kB
        Knut Anders Hatlen

        Issue Links

          Activity

          Hide
          Bryan Pendleton added a comment -

          Just updated the patch to the latest trunk head to resolve a few merges, and re-attached it.

          Show
          Bryan Pendleton added a comment - Just updated the patch to the latest trunk head to resolve a few merges, and re-attached it.
          Hide
          Knut Anders Hatlen added a comment -

          My understanding is that "FOR UPDATE OF C1,C2" means that the cursor should be updatable, but you can only update the columns C1 and C2 using that cursor. If there's an alias on C1 or C2, the statement should fail. Aliases on other columns than C1 and C2 should be allowed, and the cursor should still be updatable if they are aliased. If I read the patch correctly, it will make the cursor read only as long as there is at least one column that's aliased. I don't think it's a big usability issue if we choose to disallow aliases altogether on updatable cursors, but that would require changes to the reference manual [1] and probably also warrant a release note, so I'd prefer not to impose that restriction.

          I agree that the two different ways of aliasing (C2 AS A2 vs T1 AS A(A1,A2)) are equivalent with regard to this issue and should be handled the same way.

          I also agree that FOR UPDATE with no column list, or an implicit FOR UPDATE clause (DERBY-231), means that all columns are updatable. Perhaps the alternative, more dynamic semantics are more convenient in some cases, but we already have syntax for making just a subset of the columns updatable, so I don't think it adds much value.

          [1] http://db.apache.org/derby/docs/dev/ref/rrefsqlj41360.html#rrefsqlj41360__sqlj15384

          Show
          Knut Anders Hatlen added a comment - My understanding is that "FOR UPDATE OF C1,C2" means that the cursor should be updatable, but you can only update the columns C1 and C2 using that cursor. If there's an alias on C1 or C2, the statement should fail. Aliases on other columns than C1 and C2 should be allowed, and the cursor should still be updatable if they are aliased. If I read the patch correctly, it will make the cursor read only as long as there is at least one column that's aliased. I don't think it's a big usability issue if we choose to disallow aliases altogether on updatable cursors, but that would require changes to the reference manual [1] and probably also warrant a release note, so I'd prefer not to impose that restriction. I agree that the two different ways of aliasing (C2 AS A2 vs T1 AS A(A1,A2)) are equivalent with regard to this issue and should be handled the same way. I also agree that FOR UPDATE with no column list, or an implicit FOR UPDATE clause ( DERBY-231 ), means that all columns are updatable. Perhaps the alternative, more dynamic semantics are more convenient in some cases, but we already have syntax for making just a subset of the columns updatable, so I don't think it adds much value. [1] http://db.apache.org/derby/docs/dev/ref/rrefsqlj41360.html#rrefsqlj41360__sqlj15384
          Hide
          Bryan Pendleton added a comment -

          Attached is a patch proposal. I'm still running tests, and would
          appreciate suggestions for more tests.

          This patch makes two basic changes:
          1) It modifies ResultColumnList.markUpdatableByCursor so that
          it only marks a ResultColumn as updatable if it is not aliased,
          that is, if the actual column name of its ColumnReference matches
          the ResultColumn name.
          2) It modifies SelectNode.isUpdatableCursor to call the new method
          FromTable.columnsAreUpdatable, which checks the columns to
          see if they have been aliased, in which case the select is NOT updatable.

          There are various aspects to this code that I'm not completely clear on.
          For one thing, I'm not clear on the differences between the following
          two techniques for introducing column aliasing:

          SELECT c1 as a1, c2 as a2 from t1 as abcde

          versus

          select * from t1 as a(a1,a2)

          They seem to take substantially different paths through the code and
          the resulting data structures are different, so the two basic changes that
          I made seemed necessary, as the first change handles the first statement,
          and the second change handles the second statement.

          Secondly, I'm not clear on the overall semantics of "the ResultSet is updatable"
          versus "this particular column is updatable". It seems like the FOR UPDATE OF
          clause allows for the specification of a particular subset of columns which
          are updatable, but one can also simply say "FOR UPDATE", or can in fact
          omit the clause entirely (this is DERBY-231), in which case the implication is,
          I guess, that all columns must be updatable in order for the ResultSet to be updatable?

          Or is it more dynamic? Is the intent that, if I say "FOR UPDATE OF C1,C2", then
          only C1 and C2 are updatable, but if I say just "FOR UPDATE" or omit it entirely,
          then so long as at least one column is updatable, the ResultSet should be updatable,
          and the computation of updatability must be deferred until the actual attempt to
          perform an update (by calling rs.updateXXX) is performed?

          As I said, it would be great to have suggestions for more test cases to add!

          Show
          Bryan Pendleton added a comment - Attached is a patch proposal. I'm still running tests, and would appreciate suggestions for more tests. This patch makes two basic changes: 1) It modifies ResultColumnList.markUpdatableByCursor so that it only marks a ResultColumn as updatable if it is not aliased, that is, if the actual column name of its ColumnReference matches the ResultColumn name. 2) It modifies SelectNode.isUpdatableCursor to call the new method FromTable.columnsAreUpdatable, which checks the columns to see if they have been aliased, in which case the select is NOT updatable. There are various aspects to this code that I'm not completely clear on. For one thing, I'm not clear on the differences between the following two techniques for introducing column aliasing: SELECT c1 as a1, c2 as a2 from t1 as abcde versus select * from t1 as a(a1,a2) They seem to take substantially different paths through the code and the resulting data structures are different, so the two basic changes that I made seemed necessary, as the first change handles the first statement, and the second change handles the second statement. Secondly, I'm not clear on the overall semantics of "the ResultSet is updatable" versus "this particular column is updatable". It seems like the FOR UPDATE OF clause allows for the specification of a particular subset of columns which are updatable, but one can also simply say "FOR UPDATE", or can in fact omit the clause entirely (this is DERBY-231 ), in which case the implication is, I guess, that all columns must be updatable in order for the ResultSet to be updatable? Or is it more dynamic? Is the intent that, if I say "FOR UPDATE OF C1,C2", then only C1 and C2 are updatable, but if I say just "FOR UPDATE" or omit it entirely, then so long as at least one column is updatable, the ResultSet should be updatable, and the computation of updatability must be deferred until the actual attempt to perform an update (by calling rs.updateXXX) is performed? As I said, it would be great to have suggestions for more test cases to add!
          Hide
          Bryan Pendleton added a comment -

          Thanks Knut, I think you are exactly right. I'll pursue the "refuse it with an error"
          approach, and I'll try to include test cases both with and without FOR UPDATE.

          Thanks for the pointer to DERBY-231, the behavior is making more sense now.

          Show
          Bryan Pendleton added a comment - Thanks Knut, I think you are exactly right. I'll pursue the "refuse it with an error" approach, and I'll try to include test cases both with and without FOR UPDATE. Thanks for the pointer to DERBY-231 , the behavior is making more sense now.
          Hide
          Knut Anders Hatlen added a comment -

          It made no difference to add FOR UPDATE to the repro. When I added FOR UPDATE OF X, I got NPE/assert failure.

          Show
          Knut Anders Hatlen added a comment - It made no difference to add FOR UPDATE to the repro. When I added FOR UPDATE OF X, I got NPE/assert failure.
          Hide
          Knut Anders Hatlen added a comment -

          Could the difference be explicit vs implicit FOR UPDATE clause? Ref manual says that correlation name cannot be used on updatable columns. http://db.apache.org/derby/docs/dev/ref/rrefcorrelationname.html

          Updatable result sets used to require FOR UPDATE until DERBY-231. The test case in UpdatableResultSetTest appended "FOR UPDATE of c1" to the select statement, whereas the repro doesn't use FOR UPDATE.

          Show
          Knut Anders Hatlen added a comment - Could the difference be explicit vs implicit FOR UPDATE clause? Ref manual says that correlation name cannot be used on updatable columns. http://db.apache.org/derby/docs/dev/ref/rrefcorrelationname.html Updatable result sets used to require FOR UPDATE until DERBY-231 . The test case in UpdatableResultSetTest appended "FOR UPDATE of c1" to the select statement, whereas the repro doesn't use FOR UPDATE.
          Hide
          Bryan Pendleton added a comment -

          Interestingly, UpdatableResultSetTest.testDeleteRowWithCorrelationForColumnName()
          has some very similar, but not identical, scenarios.

          That test case appears to state that updating a column which has an alias should
          not be allowed.

          I'll study this test case in more detail, and try to figure out why we refuse the statement
          in that case, but not in the scenario from the repro script.

          Show
          Bryan Pendleton added a comment - Interestingly, UpdatableResultSetTest.testDeleteRowWithCorrelationForColumnName() has some very similar, but not identical, scenarios. That test case appears to state that updating a column which has an alias should not be allowed. I'll study this test case in more detail, and try to figure out why we refuse the statement in that case, but not in the scenario from the repro script.
          Hide
          Bryan Pendleton added a comment -

          I've been investigating this issue a bit, and want to note
          what I've found, as well as several ideas.

          Firstly, the core of the issue occurs in EmbedResultSet.updateRow, which is
          attempting to construct and execute, on the fly, a statement like:

          update t set x = 2 where current of my-cursor-name

          The problem arises due to the fact that the select statement:

          select * from t as a(a1)

          has introduced aliases for both the table (T => A) and the column (X => A1)

          Now, when EmbedResultSet.updateRow processes the table, it is careful
          to use the true underlying base table name (T):

          updateWhereCurrentOfSQL.append(getFullBaseTableName(targetTable));//got the underlying (schema.)table name

          The problem arises a bit later, when it tries to access the columns which
          are being updated:

          for (int i=1; i<=resultDescription.getColumnCount(); i++) { //in this for loop we are constructing columnname=?,... part of the update sql
          if (columnGotUpdated[i-1])

          { //if the column got updated, do following if (foundOneColumnAlready) updateWhereCurrentOfSQL.append(","); //using quotes around the column name to preserve case sensitivity updateWhereCurrentOfSQL.append(IdUtil.normalToDelimited( resultDescription.getColumnDescriptor(i).getName()) + "=?"); foundOneColumnAlready = true; }

          }

          The issue here is that resultDescription.getColumnDescriptor is a
          GenericColumnDescriptor, which was constructed from a ResultColumnDescriptor
          by the method ResultColumnList.makeResultDescriptors(), and this code doesn't
          have enough "smarts" to understand the difference between the column's "exposedName" (A1)
          and its true base column name (X).

          So, I can see two possibilities:
          1) Enhance GenericColumnDescriptor, ResultColumnDescriptor, and (probably) ResultColumn,
          so that we have a new method GenericColumnDescriptor.getBaseColumnName(), and call
          that new method from EmbedResultSet.updateRow.
          2) Refuse the statement. Change (probably) SelectNode.isUpdatableCursor() so that it
          looks through its ResultColumnList to see if any of the result columns have been aliased,
          and state that a SELECT with aliased column names is NOT a legal updatable SELECT statement.

          I'll investigate both possibilities further, just wanted to note my results so far.

          Show
          Bryan Pendleton added a comment - I've been investigating this issue a bit, and want to note what I've found, as well as several ideas. Firstly, the core of the issue occurs in EmbedResultSet.updateRow, which is attempting to construct and execute, on the fly, a statement like: update t set x = 2 where current of my-cursor-name The problem arises due to the fact that the select statement: select * from t as a(a1) has introduced aliases for both the table (T => A) and the column (X => A1) Now, when EmbedResultSet.updateRow processes the table, it is careful to use the true underlying base table name (T): updateWhereCurrentOfSQL.append(getFullBaseTableName(targetTable));//got the underlying (schema.)table name The problem arises a bit later, when it tries to access the columns which are being updated: for (int i=1; i<=resultDescription.getColumnCount(); i++) { //in this for loop we are constructing columnname=?,... part of the update sql if (columnGotUpdated [i-1] ) { //if the column got updated, do following if (foundOneColumnAlready) updateWhereCurrentOfSQL.append(","); //using quotes around the column name to preserve case sensitivity updateWhereCurrentOfSQL.append(IdUtil.normalToDelimited( resultDescription.getColumnDescriptor(i).getName()) + "=?"); foundOneColumnAlready = true; } } The issue here is that resultDescription.getColumnDescriptor is a GenericColumnDescriptor, which was constructed from a ResultColumnDescriptor by the method ResultColumnList.makeResultDescriptors(), and this code doesn't have enough "smarts" to understand the difference between the column's "exposedName" (A1) and its true base column name (X). So, I can see two possibilities: 1) Enhance GenericColumnDescriptor, ResultColumnDescriptor, and (probably) ResultColumn, so that we have a new method GenericColumnDescriptor.getBaseColumnName(), and call that new method from EmbedResultSet.updateRow. 2) Refuse the statement. Change (probably) SelectNode.isUpdatableCursor() so that it looks through its ResultColumnList to see if any of the result columns have been aliased, and state that a SELECT with aliased column names is NOT a legal updatable SELECT statement. I'll investigate both possibilities further, just wanted to note my results so far.
          Hide
          Knut Anders Hatlen added a comment -

          Triaged for 10.5.2.

          Show
          Knut Anders Hatlen added a comment - Triaged for 10.5.2.

            People

            • Assignee:
              Bryan Pendleton
              Reporter:
              Knut Anders Hatlen
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:

                Development