Details

    • Type: Improvement Improvement
    • Status: Reopened
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: 10.11.0.0
    • Component/s: SQL
    • Urgency:
      Normal
    • Issue & fix info:
      Patch Available

      Description

      In many situations it is desirable to have constraints checking taking place only at transaction commit time, and not before. If e.g. there is a chain of foreign key constraints between tables, insert statements have to be ordered to avoid constraint violations. If foreign key references are circular, the DML has to be split into insert statements and subsequent update statements by the user.

      In other words, with deferred constraints checking, life is much easier for the user. Also it can create problems with softwares such as object-relational mapping tools that are not prepared for statement ordering and thus depend on deferred constraints checking.

      1. deferredConstraints.html
        15 kB
        Dag H. Wanvik
      2. deferredConstraints.html
        16 kB
        Dag H. Wanvik
      3. derby-532-syntax-binding-dict-1.diff
        89 kB
        Dag H. Wanvik
      4. derby-532-syntax-binding-dict-1.status
        3 kB
        Dag H. Wanvik
      5. derby-532-syntax-binding-dict-2.diff
        57 kB
        Dag H. Wanvik
      6. derby-532-syntax-binding-dict-2.status
        2 kB
        Dag H. Wanvik
      7. derby-532-syntax-binding-dict-all-1.diff
        118 kB
        Dag H. Wanvik
      8. derby-532-testAlterConstraintInvalidation.diff
        4 kB
        Dag H. Wanvik
      9. derby-532-testAlterConstraintInvalidation.status
        0.4 kB
        Dag H. Wanvik
      10. derby-532-unique-pk-1.diff
        222 kB
        Dag H. Wanvik
      11. derby-532-unique-pk-1.status
        6 kB
        Dag H. Wanvik
      12. deferredConstraints.html
        18 kB
        Dag H. Wanvik
      13. derby-532-unique-pk-2.diff
        222 kB
        Dag H. Wanvik
      14. derby-532-xa-1.diff
        17 kB
        Dag H. Wanvik
      15. deferredConstraints.html
        19 kB
        Dag H. Wanvik
      16. derby-532-xa-2.diff
        19 kB
        Dag H. Wanvik
      17. derby-532-import-1.diff
        22 kB
        Dag H. Wanvik
      18. derby-532-import-1.status
        1.0 kB
        Dag H. Wanvik
      19. derby-532-import-2.diff
        22 kB
        Dag H. Wanvik
      20. derby-532-unique-pk-3.diff
        223 kB
        Dag H. Wanvik
      21. derby-532-import-3.status
        0.9 kB
        Dag H. Wanvik
      22. derby-532-import-3.diff
        21 kB
        Dag H. Wanvik
      23. derby-532-xa-3.diff
        19 kB
        Dag H. Wanvik
      24. derby-532-xa-3.status
        1.0 kB
        Dag H. Wanvik
      25. derby-532-unique-pk-3.status
        0.1 kB
        Dag H. Wanvik
      26. derby-532-more-tests-1.diff
        9 kB
        Dag H. Wanvik
      27. derby-532-more-tests-1.stat
        0.5 kB
        Dag H. Wanvik
      28. derby-532-serializable-scan-1.diff
        223 kB
        Dag H. Wanvik
      29. derby-532-serializable-scan-2.diff
        186 kB
        Dag H. Wanvik
      30. derby-532-serializable-scan-2.stat
        4 kB
        Dag H. Wanvik
      31. derby-532-post-scan-1.diff
        179 kB
        Dag H. Wanvik
      32. derby-532-post-scan-1.stat
        4 kB
        Dag H. Wanvik
      33. derby-532-post-scan-2.diff
        184 kB
        Dag H. Wanvik
      34. derby-532-post-scan-2.stat
        4 kB
        Dag H. Wanvik
      35. IndexRowGenerator.html
        53 kB
        Dag H. Wanvik
      36. IndexDescriptorImpl.html
        44 kB
        Dag H. Wanvik
      37. IndexDescriptor.html
        20 kB
        Dag H. Wanvik
      38. derby-532-post-scan-3.stat
        4 kB
        Dag H. Wanvik
      39. SortObserver.html
        20 kB
        Dag H. Wanvik
      40. derby-532-post-scan-3.diff
        185 kB
        Dag H. Wanvik
      41. derby-532-post-scan-4.stat
        4 kB
        Dag H. Wanvik
      42. derby-532-post-scan-4.diff
        188 kB
        Dag H. Wanvik
      43. deferredConstraints.html
        18 kB
        Dag H. Wanvik
      44. derby-532-test-with-default-deferrable-all-over.diff
        5 kB
        Dag H. Wanvik
      45. derby-532-fix-metadata-1.diff
        10 kB
        Dag H. Wanvik
      46. derby-532-fix-metadata-1.status
        0.4 kB
        Dag H. Wanvik
      47. derby-532-fix-drop-not-nullable.diff
        4 kB
        Dag H. Wanvik
      48. derby-532-fix-drop-not-nullable.status
        0.5 kB
        Dag H. Wanvik
      49. derby-532-nullableUniqueFix.diff
        10 kB
        Dag H. Wanvik
      50. derby-532-nullableUniqueFix.status
        0.7 kB
        Dag H. Wanvik
      51. derby-532-test-speedup.diff
        22 kB
        Dag H. Wanvik
      52. derby-532-test-speedup.status
        1.0 kB
        Dag H. Wanvik
      53. derby-532-allow-pk-unique-1.status
        0.6 kB
        Dag H. Wanvik
      54. derby-532-allow-pk-unique-1.diff
        21 kB
        Dag H. Wanvik
      55. derby-532-upgrade-1.diff
        4 kB
        Dag H. Wanvik
      56. derby-532-upgrade-1.status
        0.3 kB
        Dag H. Wanvik
      57. derby-532-upgrade-1b.diff
        5 kB
        Dag H. Wanvik
      58. derby-532-check-constraints-1.diff
        283 kB
        Dag H. Wanvik
      59. derby-532-check-constraints-1.stat
        5 kB
        Dag H. Wanvik
      60. derby-532-check-constraints-2.diff
        288 kB
        Dag H. Wanvik
      61. derby-532-check-constraints-2.stat
        5 kB
        Dag H. Wanvik
      62. derby-532-fk-baseline.diff
        3 kB
        Dag H. Wanvik
      63. derby-532-fk-baseline-2.diff
        23 kB
        Dag H. Wanvik
      64. derby-532-fk-1.diff
        111 kB
        Dag H. Wanvik
      65. derby-532-fk-3.diff
        139 kB
        Dag H. Wanvik
      66. derby-532-fk-3.stat
        2 kB
        Dag H. Wanvik
      67. derby-532-fk-4.diff
        192 kB
        Dag H. Wanvik
      68. derby-532-fk-5.diff
        199 kB
        Dag H. Wanvik
      69. derby-532-fk-5.stat
        2 kB
        Dag H. Wanvik

        Issue Links

          Activity

          Hide
          Dag H. Wanvik added a comment - - edited

          Uploading a functional specification for this, with some implementations ideas. I'd like to try to build deferrable unique constraints in a first increment; but the specification covers all supported types.

          It seems it would desirable to avoid dropping a unique supporting index if few rows are modified; options include temporarily making it non-unique (hard?) and vice versa at commit, or using a temporary shadow table (would complicate updates and selects).

          Show
          Dag H. Wanvik added a comment - - edited Uploading a functional specification for this, with some implementations ideas. I'd like to try to build deferrable unique constraints in a first increment; but the specification covers all supported types. It seems it would desirable to avoid dropping a unique supporting index if few rows are modified; options include temporarily making it non-unique (hard?) and vice versa at commit, or using a temporary shadow table (would complicate updates and selects).
          Hide
          Rick Hillegas added a comment -

          Linking this issue to DERBY-6303. The reasons for deferring unique constraints probably apply to deferring unique index enforcement too.

          Show
          Rick Hillegas added a comment - Linking this issue to DERBY-6303 . The reasons for deferring unique constraints probably apply to deferring unique index enforcement too.
          Hide
          Dag H. Wanvik added a comment -

          Uploading 1.4 of the specification (adding note on NOT NULL).

          Show
          Dag H. Wanvik added a comment - Uploading 1.4 of the specification (adding note on NOT NULL).
          Hide
          Dag H. Wanvik added a comment - - edited

          Uploading a patch, derby-532-syntax-binding-dict-1. My local git commit record says:

          Patch derby-532-syntax-binding-dict-1. This patch wires in the syntax
          for deferred constraints (aka "constraint characteristics"). It also
          does

          • binding checks for CREATE TABLE constraints and SET constraints
            statement (new). Binding is still missing for ALTER TABLE
            constraints clauses.
          • temporarily throws not yet implemented (0A000.S) for all usage
            except when characteristics coincide with the current (and future)
            Derby defaults, i.e. NOT DEFERRABLE [INITIALLY IMMEDIATE] ENFORCED
          • checks inconsistencies in characteristics (illegal combinations), cf.
            42X97 "Conflicting constraint characteristics for constraint"
          • implements implied DEFERRABLE of (only) INITIALLY DEFERRED is
            specified.
          • if the property "derby.constraintsTesting" is set, persists
            characteristics to dictionary by overloading the existing STATE
            character according to specification. This property will go away
            once the feature set is implemented, only implemented not to be able
            to test dictionary persistence
          • throws 42XAK "Constraint characteristics not allowed for NOT NULL."
            for NOT NULL characteristics since this constraint type is not
            explicitly implemented as a constraint in Derby (yet, at least)
          • adds a new test, ConstraintCharacteristicsTest to test the above and
            wires it into the lang suite.
          Show
          Dag H. Wanvik added a comment - - edited Uploading a patch, derby-532-syntax-binding-dict-1. My local git commit record says: Patch derby-532-syntax-binding-dict-1. This patch wires in the syntax for deferred constraints (aka "constraint characteristics"). It also does binding checks for CREATE TABLE constraints and SET constraints statement (new). Binding is still missing for ALTER TABLE constraints clauses. temporarily throws not yet implemented (0A000.S) for all usage except when characteristics coincide with the current (and future) Derby defaults, i.e. NOT DEFERRABLE [INITIALLY IMMEDIATE] ENFORCED checks inconsistencies in characteristics (illegal combinations), cf. 42X97 "Conflicting constraint characteristics for constraint" implements implied DEFERRABLE of (only) INITIALLY DEFERRED is specified. if the property "derby.constraintsTesting" is set, persists characteristics to dictionary by overloading the existing STATE character according to specification. This property will go away once the feature set is implemented, only implemented not to be able to test dictionary persistence throws 42XAK "Constraint characteristics not allowed for NOT NULL." for NOT NULL characteristics since this constraint type is not explicitly implemented as a constraint in Derby (yet, at least) adds a new test, ConstraintCharacteristicsTest to test the above and wires it into the lang suite.
          Hide
          Dag H. Wanvik added a comment -

          Just to be clear, the patch is of course not yet committed to the Apache repos. Regressions ran ok.
          Do people think its OK to build and commit this feature incrementally and just keep NOT IMPLEMENTED messages until all is in place? I do not yet know how much work it will turn out to be in the lower layers.

          Show
          Dag H. Wanvik added a comment - Just to be clear, the patch is of course not yet committed to the Apache repos. Regressions ran ok. Do people think its OK to build and commit this feature incrementally and just keep NOT IMPLEMENTED messages until all is in place? I do not yet know how much work it will turn out to be in the lower layers.
          Hide
          Rick Hillegas added a comment -

          Hi Dag,

          That approach sounds fine to me, particularly since you're incrementally adding tests as you go along. Thanks.

          Show
          Rick Hillegas added a comment - Hi Dag, That approach sounds fine to me, particularly since you're incrementally adding tests as you go along. Thanks.
          Hide
          Dag H. Wanvik added a comment -

          Thanks, Rick. Building more tests, I noticed I had made the ALTER TABLE ALTER CONSTRAINT syntax too lenient; I'll fix that in the next version of the patch.

          Show
          Dag H. Wanvik added a comment - Thanks, Rick. Building more tests, I noticed I had made the ALTER TABLE ALTER CONSTRAINT syntax too lenient; I'll fix that in the next version of the patch.
          Hide
          Dag H. Wanvik added a comment - - edited

          Uploading derby-532-syntax-binding-dict-2, which builds on patch #1 (apply consecutively).

          Elaborates ALTER TABLE ALTER TABLE, correcting syntax, adding name binding and updating dictionary (subject to the property derby.constraintsTesting) in a new AlterConstraintConstantAction class.

          Built out tests in ConstraintCharacteristicsTest to systematically check all possible characteristics combinations in all cases, and adding tests for ALTER TABLE ALTER TABLE, including dictionary updates.

          Regressions ran ok.

          Show
          Dag H. Wanvik added a comment - - edited Uploading derby-532-syntax-binding-dict-2, which builds on patch #1 (apply consecutively). Elaborates ALTER TABLE ALTER TABLE, correcting syntax, adding name binding and updating dictionary (subject to the property derby.constraintsTesting) in a new AlterConstraintConstantAction class. Built out tests in ConstraintCharacteristicsTest to systematically check all possible characteristics combinations in all cases, and adding tests for ALTER TABLE ALTER TABLE, including dictionary updates. Regressions ran ok.
          Hide
          Dag H. Wanvik added a comment -

          With the two patches above are finalized, I should be able to start thinking about actually deferring the constraints. As a first step, I'll try to understand how to add machinery for checking constraints at commit time, rather than at statement execution time, maybe for a check constraint by disabling it (or dropping it if it comes into play during execution) first and then checking all rows with a select like what is done when adding a check constraint. In general I'd want to implement a naive approach first and the later try to do something performant. Perhaps for small tables a naive approach could even be faster; time will tell.

          Show
          Dag H. Wanvik added a comment - With the two patches above are finalized, I should be able to start thinking about actually deferring the constraints. As a first step, I'll try to understand how to add machinery for checking constraints at commit time, rather than at statement execution time, maybe for a check constraint by disabling it (or dropping it if it comes into play during execution) first and then checking all rows with a select like what is done when adding a check constraint. In general I'd want to implement a naive approach first and the later try to do something performant. Perhaps for small tables a naive approach could even be faster; time will tell.
          Hide
          Rick Hillegas added a comment -

          Thanks for the patches, Dag. I was unable to apply them, maybe because the diff was produced by git? When I try to apply the first patch via this command...

          patch -p 0 -i $patchFile

          ...I get this error message:

          can't find file to patch at input line 5
          Perhaps you used the wrong -p or --strip option?
          The text leading up to this was:
          --------------------------

          diff --git a/java/engine/org/apache/derby/iapi/sql/dictionary/CheckConstraintDescriptor.java b/java/engine/org/apache/derby/iapi/sql/dictionary/CheckConstraintDescriptor.java
          index 6378411..ceb2a99 100644
          — a/java/engine/org/apache/derby/iapi/sql/dictionary/CheckConstraintDescriptor.java
          +++ b/java/engine/org/apache/derby/iapi/sql/dictionary/CheckConstraintDescriptor.java
          --------------------------

          Maybe I need to give the patch command another switch or argument?

          Nevertheless, I did go through the patch files and have a couple, small comments:

          ---------

          SYSCONSTRAINTSRowFactory:

          o Could probably use a comment somewhere explaining the encoding
          scheme for SYSCONSTRAINTS.STATE. Extra credit if the explanation is
          something which would be easy to copy into the Reference Guide section
          on this catalog. And/or update the functional spec accordingly.

          ---------

          AlterConstraintConstantAction:

          o Does this action need to invalidate statements?

          Thanks,
          -Rick

          Show
          Rick Hillegas added a comment - Thanks for the patches, Dag. I was unable to apply them, maybe because the diff was produced by git? When I try to apply the first patch via this command... patch -p 0 -i $patchFile ...I get this error message: can't find file to patch at input line 5 Perhaps you used the wrong -p or --strip option? The text leading up to this was: -------------------------- diff --git a/java/engine/org/apache/derby/iapi/sql/dictionary/CheckConstraintDescriptor.java b/java/engine/org/apache/derby/iapi/sql/dictionary/CheckConstraintDescriptor.java index 6378411..ceb2a99 100644 — a/java/engine/org/apache/derby/iapi/sql/dictionary/CheckConstraintDescriptor.java +++ b/java/engine/org/apache/derby/iapi/sql/dictionary/CheckConstraintDescriptor.java -------------------------- Maybe I need to give the patch command another switch or argument? Nevertheless, I did go through the patch files and have a couple, small comments: --------- SYSCONSTRAINTSRowFactory: o Could probably use a comment somewhere explaining the encoding scheme for SYSCONSTRAINTS.STATE. Extra credit if the explanation is something which would be easy to copy into the Reference Guide section on this catalog. And/or update the functional spec accordingly. --------- AlterConstraintConstantAction: o Does this action need to invalidate statements? Thanks, -Rick
          Hide
          Dag H. Wanvik added a comment - - edited

          Thanks, Rick. You need the -p1 option to patch for git generated diffs. I'll add more comments about the encoding scheme.
          As far as invalidation, I am not sure yet; it will depend on whether we generate different code for enforced/not enforced. Most likely, it will be "yes".

          Show
          Dag H. Wanvik added a comment - - edited Thanks, Rick. You need the -p1 option to patch for git generated diffs. I'll add more comments about the encoding scheme. As far as invalidation, I am not sure yet; it will depend on whether we generate different code for enforced/not enforced. Most likely, it will be "yes".
          Hide
          Dag H. Wanvik added a comment - - edited

          Uploading derby-532-syntax-binding-dict-all-1, which includes the two first patches plus adjusts the patch to work on current trunk, and also adds the extra documentation requested by Rick (in SYSCONSTRAINTSRowFactory). Regressions went ok.

          Show
          Dag H. Wanvik added a comment - - edited Uploading derby-532-syntax-binding-dict-all-1, which includes the two first patches plus adjusts the patch to work on current trunk, and also adds the extra documentation requested by Rick (in SYSCONSTRAINTSRowFactory). Regressions went ok.
          Hide
          ASF subversion and git services added a comment -

          Commit 1517823 from Dag H. Wanvik in branch 'code/trunk'
          [ https://svn.apache.org/r1517823 ]

          DERBY-532 Support deferrable constraints

          Patch derby-532-syntax-binding-dict-all-1. This patch wires in the syntax
          for deferred constraints (aka "constraint characteristics"). It also
          does

          • binding checks for CREATE TABLE constraints and SET constraints
            statement (new). Binding is still missing for ALTER TABLE
            constraints clauses.
          • temporarily throws not yet implemented (0A000.S) for all usage
            except when characteristics coincide with the current (and future)
            Derby defaults, i.e. NOT DEFERRABLE [INITIALLY IMMEDIATE] ENFORCED
          • checks inconsistencies in characteristics (illegal combinations), cf.
            42X97 "Conflicting constraint characteristics for constraint"
          • implements implied DEFERRABLE of (only) INITIALLY DEFERRED is
            specified.
          • if the property "derby.constraintsTesting" is set, persists
            characteristics to dictionary by overloading the existing STATE
            character according to specification. This property will go away
            once the feature set is implemented, only implemented not to be able
            to test dictionary persistence
          • throws 42XAK "Constraint characteristics not allowed for NOT NULL."
            for NOT NULL characteristics since this constraint type is not
            explicitly implemented as a constraint in Derby (yet, at least)
          • adds a new test, ConstraintCharacteristicsTest to test the above and
            wires it into the lang suite.

          (part 2):

          • Elaborates ALTER TABLE ALTER TABLE, correcting syntax, adding name
            binding and updating dictionary (subject to the property
            derby.constraintsTesting) in a new AlterConstraintConstantAction
            class.
          • Built out tests in ConstraintCharacteristicsTest to systematically
            check all possible characteristics combinations in all cases, and
            adding tests for ALTER TABLE ALTER TABLE, including dictionary
            updates.
          Show
          ASF subversion and git services added a comment - Commit 1517823 from Dag H. Wanvik in branch 'code/trunk' [ https://svn.apache.org/r1517823 ] DERBY-532 Support deferrable constraints Patch derby-532-syntax-binding-dict-all-1. This patch wires in the syntax for deferred constraints (aka "constraint characteristics"). It also does binding checks for CREATE TABLE constraints and SET constraints statement (new). Binding is still missing for ALTER TABLE constraints clauses. temporarily throws not yet implemented (0A000.S) for all usage except when characteristics coincide with the current (and future) Derby defaults, i.e. NOT DEFERRABLE [INITIALLY IMMEDIATE] ENFORCED checks inconsistencies in characteristics (illegal combinations), cf. 42X97 "Conflicting constraint characteristics for constraint" implements implied DEFERRABLE of (only) INITIALLY DEFERRED is specified. if the property "derby.constraintsTesting" is set, persists characteristics to dictionary by overloading the existing STATE character according to specification. This property will go away once the feature set is implemented, only implemented not to be able to test dictionary persistence throws 42XAK "Constraint characteristics not allowed for NOT NULL." for NOT NULL characteristics since this constraint type is not explicitly implemented as a constraint in Derby (yet, at least) adds a new test, ConstraintCharacteristicsTest to test the above and wires it into the lang suite. (part 2): Elaborates ALTER TABLE ALTER TABLE, correcting syntax, adding name binding and updating dictionary (subject to the property derby.constraintsTesting) in a new AlterConstraintConstantAction class. Built out tests in ConstraintCharacteristicsTest to systematically check all possible characteristics combinations in all cases, and adding tests for ALTER TABLE ALTER TABLE, including dictionary updates.
          Hide
          Dag H. Wanvik added a comment -

          Committed derby-532-syntax-binding-dict-all-1 at svn 1517823.

          Show
          Dag H. Wanvik added a comment - Committed derby-532-syntax-binding-dict-all-1 at svn 1517823.
          Hide
          Dag H. Wanvik added a comment -

          Uploading patch derby-532-testAlterConstraintInvalidation.

          Adds a fixture to test that prepared statement is invalidated when a
          table its depends on undergoes an ALTER TABLE ALTER CONSTRAINT
          statement. As it turns out, this is already handled by the common
          machinery for ALTER TABLE.

          Show
          Dag H. Wanvik added a comment - Uploading patch derby-532-testAlterConstraintInvalidation. Adds a fixture to test that prepared statement is invalidated when a table its depends on undergoes an ALTER TABLE ALTER CONSTRAINT statement. As it turns out, this is already handled by the common machinery for ALTER TABLE.
          Hide
          ASF subversion and git services added a comment -

          Commit 1519045 from Dag H. Wanvik in branch 'code/trunk'
          [ https://svn.apache.org/r1519045 ]

          DERBY-532: Support deferrable constraints

          Patch derby-532-testAlterConstraintInvalidation.

          Adds a fixture to test that prepared statement is invalidated when a
          table its depends on undergoes an ALTER TABLE ALTER CONSTRAINT
          statement. As it turns out, this is already handled by the common
          machinery for ALTER TABLE.

          Show
          ASF subversion and git services added a comment - Commit 1519045 from Dag H. Wanvik in branch 'code/trunk' [ https://svn.apache.org/r1519045 ] DERBY-532 : Support deferrable constraints Patch derby-532-testAlterConstraintInvalidation. Adds a fixture to test that prepared statement is invalidated when a table its depends on undergoes an ALTER TABLE ALTER CONSTRAINT statement. As it turns out, this is already handled by the common machinery for ALTER TABLE.
          Hide
          Dag H. Wanvik added a comment - - edited

          This experimental patch enables deferrable constraints for

          a) primary key constraints
          b) unique constraint with not nullable columns
          c) unique constraint with nullable columns

          by new logic in supporting BTree indexes and sorts.

          The patch includes relaxing the constraint at insertion and update time, as well as adding a constraint to an existing table. There is as yet no support for importing data into a table with a deferred constraint, cf. the issue DERBY-6374. When that issue is fixed, I'll add support for deferrable constraints for import as well.

          Derby treats constraints a) and b) the same, and in the code these are marked as "unique" when they are not deferrable (as in existing code).

          Constraint type c) is currently marked as "uniqueWithDuplicateNulls". Insert/update of these is implemented in the BTree by including the RowLocation of the base row in the set of keys in the index row (DERBY-3330). This makes them trivially unique, but there is an extra code path in BTreeController that checks neighbor rows for duplicates, and only allows insertion if the key contains a null. When adding a constraint to an existing table, these are handled by a specially crafted sorter (UniqueWithDuplicateNullsMergeSort).

          The implementation of insert/update of deferrable indexes is based on the same approach, i.e. neighbor row comparison and tweaking the sorters. [This means a) and b) if deferrable are no longer marked "unique"].

          Deferrable indexes are not shared.

          If there are duplicates and we have deferred constraint mode (a dynamic session property), we save the duplicate index row in a disk based hash table (DeferredDuplicates#rememberDuplicate).

          For a) and b), constraints which are deferrable are marked as "uniqueDeferrable" and "hasDeferrableChecking". Constraints of type c) which are deferrable are marked "uniqueWithDuplicateNulls" and "hasDeferrableChecking". These marks determines the code paths used. Note that existing indexes and non-deferrable constraint do not get a new code path, which should preserve correctness and performance of those.

          Now, with these markers in place, deferral of checks happens in logic in two places:

          {{ IndexChanger#insertAndCheckDups + BTreeController#doIns}}

          and CreateIndexConstantAction#executeConstantAction +
          {{ MergeSort#compare and UniqueWithDuplicateNullsMergeSort#compare }}

          The former is active for deferral under INSERT and UPDATE. The latter when adding a deferrable constraint to an existing table, when we sort existing rows detecting any duplicates.

          At transaction commit (1), or when the constraint mode for a deferred constraint is changed back to immediate (2), we validate the constraint (DeferredDuplicates#validate) by replaying the hash table and scanning the index for the duplicate index rows to ascertain there are none, or else we have an error: transaction or statement severity respectively for (1) and (2).

          The constraint mode is a SQL session level variable, and inside routines (nested connections), we push this on the stack. This means change of the constraint mode inside nested connections will be popped on routine exit. If, as part of this, a constraint changes from deferred to immediate mode, we also validate it for correctness. If this fail, the transaction rolls back (GenericLanguageConnectionContext#popNestedSessionContext calls #compareConstraintModes).

          #popNestedSessionContext is new: earlier we didn't need a handle for popping SQL session context, we needed one in this case, so we implemented the general mechanism as well. That hook is called from GenericPreparedStatement#executeStmt. As a part of this effort, we also renamed #setupNestedSessionContext to #pushNestedSessionContext.

          The test case "testBasicDeferral" and "testRoutines" has been added to ConstraintCharacteristicsTest to test these basic behaviors.

          The "not enforced" feature is not yet implemented in this patch.

          Regressions passed OK with this patch.

          Patch details:

          new file: java/engine/org/apache/derby/impl/sql/execute/DeferredDuplicates.java

          New static class which handles much of the mechanics of saving duplicate index rows and later validating the index.

          new file: java/engine/org/apache/derby/impl/store/access/btree/index/B2I_10_4.java
          modified: java/engine/org/apache/derby/impl/store/access/btree/index/B2I.java
          modified: java/engine/org/apache/derby/impl/store/access/btree/index/B2IFactory.java
          modified: java/engine/org/apache/derby/iapi/services/io/RegisteredFormatIds.java
          modified: java/engine/org/apache/derby/iapi/services/io/StoredFormatIds.java

          The index conglomerate format has been bumped so we can store the deferral properties. Upgrade of indexes from old databases is lazy: they are not deferrable, but the format just lacks two final properties which default to false: hasDeferrableChecking and isUniqueDeferrable. The new format id is ACCESS_B2I_V6_ID. The new class B2I_10_4 makes sure we can write old conglomerate format correctly in soft upgrade mode (in format ACCESS_B2I_V5_ID).

          modified: java/engine/org/apache/derby/iapi/store/raw/RawStoreFactory.java

          Added DERBY_STORE_MINOR_VERSION_11 which allows use of the new index conglomerate format.

          modified: java/engine/org/apache/derby/catalog/IndexDescriptor.java
          modified: java/engine/org/apache/derby/catalog/types/IndexDescriptorImpl.java
          modified: java/engine/org/apache/derby/iapi/sql/dictionary/IndexRowGenerator.java
          modified: java/engine/org/apache/derby/impl/sql/compile/TableElementList.java
          modified: java/engine/org/apache/derby/impl/sql/execute/AlterTableConstantAction.java
          modified: java/engine/org/apache/derby/impl/sql/execute/DDLSingleTableConstantAction.java
          modified: java/engine/org/apache/derby/impl/sql/execute/GenericConstantActionFactory.java
          modified: java/engine/org/apache/derby/impl/sql/execute/MaterializedResultSet.java
          modified: java/engine/org/apache/derby/impl/sql/execute/RIBulkChecker.java
          modified: java/engine/org/apache/derby/impl/sql/execute/RowChangerImpl.java
          modified: java/engine/org/apache/derby/impl/sql/execute/TemporaryRowHolderImpl.java
          modified: java/engine/org/apache/derby/impl/store/access/btree/BTree.java
          modified: java/engine/org/apache/derby/impl/store/access/btree/index/B2IController.java
          modified: java/engine/org/apache/derby/impl/store/access/PropertyConglomerate.java

          Addition/propagation of "hasDeferrableChecking" and "uniqueDeferrable" properties and add "deferred=false" argument to ConglomerateController#insert calls.

          modified: java/engine/org/apache/derby/iapi/sql/conn/SQLSessionContext.java
          modified: java/engine/org/apache/derby/impl/sql/conn/SQLSessionContextImpl.java

          Storage of the session's current constraint modes and operations for pushing and clearing that information.

          modified: java/engine/org/apache/derby/impl/sql/compile/sqlgrammar.jj

          Add check that we are at dictionary 10_11 before allowing SET CONSTRAINTS to proceed. (Already done for deferred constraint creation).

          modified: java/engine/org/apache/derby/iapi/store/access/ConglomerateController.java
          modified: java/engine/org/apache/derby/impl/store/access/btree/BTreeController.java
          modified: java/engine/org/apache/derby/impl/store/access/heap/HeapController.java
          modified: java/engine/org/apache/derby/iapi/store/access/DiskHashtable.java
          modified: java/engine/org/apache/derby/impl/sql/catalog/TabInfoImpl.java
          modified: java/engine/org/apache/derby/impl/sql/execute/IndexChanger.java

          Add boolean "deferred" to indicate current behavior to #insert. For BTreeController, report back duplicates but still do insert iff deferred mode. Ignored by HeapController. The real use of the new argument is by IndexChanger which knows about constraints in deferred mode.

          modified: java/engine/org/apache/derby/impl/sql/compile/CreateIndexNode.java

          New default arguments (doesn't use the deferred values).

          modified: java/engine/org/apache/derby/iapi/store/access/RowUtil.java

          Small formatting fix to #toString.

          modified: java/engine/org/apache/derby/iapi/store/access/SortObserver.java
          modified: java/engine/org/apache/derby/impl/sql/execute/BasicSortObserver.java
          modified: java/engine/org/apache/derby/impl/sql/execute/UniqueIndexSortObserver.java
          modified: java/engine/org/apache/derby/impl/sql/execute/UniqueWithDuplicateNullsIndexSortObserver.java
          modified: java/engine/org/apache/derby/impl/store/access/sort/MergeSort.java
          modified: java/engine/org/apache/derby/impl/store/access/sort/UniqueWithDuplicateNullsMergeSo

          Changes to accommodate saving duplicates during sorting of existing rows when adding an index for a constraint.

          modified: java/engine/org/apache/derby/impl/sql/GenericPreparedStatement.java
          modified: java/engine/org/apache/derby/impl/sql/compile/StaticMethodCallNode.java

          Call hook for popNestedSessionContext. Renaming of pushNestedSessionContext.

          modified: java/engine/org/apache/derby/impl/sql/catalog/DataDictionaryImpl.java

          Removed unnecessary overload of IndexRowGenerator (as part of adding more arguments anyway).

          modified: java/engine/org/apache/derby/impl/sql/execute/AlterConstraintConstantAction.java
          modified: java/engine/org/apache/derby/impl/sql/execute/CreateConstraintConstantAction.java

          Addition of dictionary level check + one bug fix.

          modified: java/engine/org/apache/derby/impl/sql/execute/CreateIndexConstantAction.java

          Determine the flags for the index conglomerates to support constraints, see introduction, and set up for creation of correct conglomerates. Handle any initial insertion of rows when constraint mode is initially deferred.

          modified: java/engine/org/apache/derby/impl/sql/execute/InsertResultSet.java

          Partial support (preparatory code) for bulk import [type a) and b)] in the presence of deferrable constraints, but see introduction.

          modified: java/engine/org/apache/derby/impl/sql/execute/SetConstraintsConstantAction.java

          Logic for setting the session's constraint mode at execution time.

          modified: java/engine/org/apache/derby/loc/messages.xml
          modified: java/shared/org/apache/derby/shared/common/reference/SQLState.java

          New error messages.

          modified: java/testing/org/apache/derbyTesting/functionTests/tests/lang/ConstraintCharacteristicsTest.java

          Added test cases "testBasicDeferral" and "testRoutines".

          modified: java/testing/org/apache/derbyTesting/functionTests/util/T_ConsistencyChecker.java
          modified: java/testing/org/apache/derbyTesting/unitTests/store/T_AccessFactory.java
          modified: java/testing/org/apache/derbyTesting/unitTests/store/T_QualifierTest.java
          modified: java/testing/org/apache/derbyTesting/unitTests/store/T_b2i.java

          Adjustment of old tests to use deferred=false argument to ConglomerateController#insert

          modified: java/testing/org/apache/derbyTesting/unitTests/store/T_SortController.java

          Instantiate sorters with default implementations for the new SortObserver interface methods.

          modified: java/engine/org/apache/derby/iapi/sql/conn/LanguageConnectionContext.java

          Import fixes, Javadoc fixes, renaming of pushNestedSessionContext, added new interface method: setDeferred, isEffectivelyDeferred, setDeferredAll, getDeferredHashTables.

          modified:
          java/engine/org/apache/derby/impl/sql/conn/GenericLanguageConnectionContext.java

          Implementations of above. Store "deferredHashTables" for the transaction: the set of disk based hash tables that holds deferred rows for the session, if any. New logic in doCommit and doRollback to handle deferred rows and reset modes. New logic to push constraint modes state on routine invocation in pushNestedSessionContext. Added popNestedSessionContext to validate any deferred rows if we switch back to immediate checking.

          modified: java/engine/org/apache/derby/iapi/sql/conn/StatementContext.java
          modified: java/engine/org/apache/derby/iapi/store/access/TransactionController.java
          modified: java/engine/org/apache/derby/impl/sql/execute/ForeignKeyRIChecker.java
          modified: java/engine/org/apache/derby/impl/sql/execute/GenericRIChecker.java

          Javadoc fixes.

          modified: java/engine/org/apache/derby/impl/sql/execute/BaseActivation.java

          Javadoc fixes and removal of an assertion that no longer hold with added call in GenericPreparedStatement.

          modified: java/engine/org/apache/derby/impl/sql/compile/SetConstraintsNode.java

          Import fix.

          To apply the patch, you need to add empty versions of the two new files:
          java/engine/org/apache/derby/impl/sql/execute/DeferredDuplicates.java
          java/engine/org/apache/derby/impl/store/access/btree/index/B2I_10_4.java

          Show
          Dag H. Wanvik added a comment - - edited This experimental patch enables deferrable constraints for a) primary key constraints b) unique constraint with not nullable columns c) unique constraint with nullable columns by new logic in supporting BTree indexes and sorts. The patch includes relaxing the constraint at insertion and update time, as well as adding a constraint to an existing table. There is as yet no support for importing data into a table with a deferred constraint, cf. the issue DERBY-6374 . When that issue is fixed, I'll add support for deferrable constraints for import as well. Derby treats constraints a) and b) the same, and in the code these are marked as "unique" when they are not deferrable (as in existing code). Constraint type c) is currently marked as "uniqueWithDuplicateNulls". Insert/update of these is implemented in the BTree by including the RowLocation of the base row in the set of keys in the index row ( DERBY-3330 ). This makes them trivially unique, but there is an extra code path in BTreeController that checks neighbor rows for duplicates, and only allows insertion if the key contains a null. When adding a constraint to an existing table, these are handled by a specially crafted sorter (UniqueWithDuplicateNullsMergeSort). The implementation of insert/update of deferrable indexes is based on the same approach, i.e. neighbor row comparison and tweaking the sorters. [This means a) and b) if deferrable are no longer marked "unique"] . Deferrable indexes are not shared. If there are duplicates and we have deferred constraint mode (a dynamic session property), we save the duplicate index row in a disk based hash table (DeferredDuplicates#rememberDuplicate). For a) and b), constraints which are deferrable are marked as "uniqueDeferrable" and "hasDeferrableChecking". Constraints of type c) which are deferrable are marked "uniqueWithDuplicateNulls" and "hasDeferrableChecking". These marks determines the code paths used. Note that existing indexes and non-deferrable constraint do not get a new code path, which should preserve correctness and performance of those. Now, with these markers in place, deferral of checks happens in logic in two places: {{ IndexChanger#insertAndCheckDups + BTreeController#doIns}} and CreateIndexConstantAction#executeConstantAction + {{ MergeSort#compare and UniqueWithDuplicateNullsMergeSort#compare }} The former is active for deferral under INSERT and UPDATE. The latter when adding a deferrable constraint to an existing table, when we sort existing rows detecting any duplicates. At transaction commit (1), or when the constraint mode for a deferred constraint is changed back to immediate (2), we validate the constraint (DeferredDuplicates#validate) by replaying the hash table and scanning the index for the duplicate index rows to ascertain there are none, or else we have an error: transaction or statement severity respectively for (1) and (2). The constraint mode is a SQL session level variable, and inside routines (nested connections), we push this on the stack. This means change of the constraint mode inside nested connections will be popped on routine exit. If, as part of this, a constraint changes from deferred to immediate mode, we also validate it for correctness. If this fail, the transaction rolls back (GenericLanguageConnectionContext#popNestedSessionContext calls #compareConstraintModes). #popNestedSessionContext is new: earlier we didn't need a handle for popping SQL session context, we needed one in this case, so we implemented the general mechanism as well. That hook is called from GenericPreparedStatement#executeStmt. As a part of this effort, we also renamed #setupNestedSessionContext to #pushNestedSessionContext. The test case "testBasicDeferral" and "testRoutines" has been added to ConstraintCharacteristicsTest to test these basic behaviors. The "not enforced" feature is not yet implemented in this patch. Regressions passed OK with this patch. Patch details: new file: java/engine/org/apache/derby/impl/sql/execute/DeferredDuplicates.java New static class which handles much of the mechanics of saving duplicate index rows and later validating the index. new file: java/engine/org/apache/derby/impl/store/access/btree/index/B2I_10_4.java modified: java/engine/org/apache/derby/impl/store/access/btree/index/B2I.java modified: java/engine/org/apache/derby/impl/store/access/btree/index/B2IFactory.java modified: java/engine/org/apache/derby/iapi/services/io/RegisteredFormatIds.java modified: java/engine/org/apache/derby/iapi/services/io/StoredFormatIds.java The index conglomerate format has been bumped so we can store the deferral properties. Upgrade of indexes from old databases is lazy: they are not deferrable, but the format just lacks two final properties which default to false: hasDeferrableChecking and isUniqueDeferrable. The new format id is ACCESS_B2I_V6_ID. The new class B2I_10_4 makes sure we can write old conglomerate format correctly in soft upgrade mode (in format ACCESS_B2I_V5_ID). modified: java/engine/org/apache/derby/iapi/store/raw/RawStoreFactory.java Added DERBY_STORE_MINOR_VERSION_11 which allows use of the new index conglomerate format. modified: java/engine/org/apache/derby/catalog/IndexDescriptor.java modified: java/engine/org/apache/derby/catalog/types/IndexDescriptorImpl.java modified: java/engine/org/apache/derby/iapi/sql/dictionary/IndexRowGenerator.java modified: java/engine/org/apache/derby/impl/sql/compile/TableElementList.java modified: java/engine/org/apache/derby/impl/sql/execute/AlterTableConstantAction.java modified: java/engine/org/apache/derby/impl/sql/execute/DDLSingleTableConstantAction.java modified: java/engine/org/apache/derby/impl/sql/execute/GenericConstantActionFactory.java modified: java/engine/org/apache/derby/impl/sql/execute/MaterializedResultSet.java modified: java/engine/org/apache/derby/impl/sql/execute/RIBulkChecker.java modified: java/engine/org/apache/derby/impl/sql/execute/RowChangerImpl.java modified: java/engine/org/apache/derby/impl/sql/execute/TemporaryRowHolderImpl.java modified: java/engine/org/apache/derby/impl/store/access/btree/BTree.java modified: java/engine/org/apache/derby/impl/store/access/btree/index/B2IController.java modified: java/engine/org/apache/derby/impl/store/access/PropertyConglomerate.java Addition/propagation of "hasDeferrableChecking" and "uniqueDeferrable" properties and add "deferred=false" argument to ConglomerateController#insert calls. modified: java/engine/org/apache/derby/iapi/sql/conn/SQLSessionContext.java modified: java/engine/org/apache/derby/impl/sql/conn/SQLSessionContextImpl.java Storage of the session's current constraint modes and operations for pushing and clearing that information. modified: java/engine/org/apache/derby/impl/sql/compile/sqlgrammar.jj Add check that we are at dictionary 10_11 before allowing SET CONSTRAINTS to proceed. (Already done for deferred constraint creation). modified: java/engine/org/apache/derby/iapi/store/access/ConglomerateController.java modified: java/engine/org/apache/derby/impl/store/access/btree/BTreeController.java modified: java/engine/org/apache/derby/impl/store/access/heap/HeapController.java modified: java/engine/org/apache/derby/iapi/store/access/DiskHashtable.java modified: java/engine/org/apache/derby/impl/sql/catalog/TabInfoImpl.java modified: java/engine/org/apache/derby/impl/sql/execute/IndexChanger.java Add boolean "deferred" to indicate current behavior to #insert. For BTreeController, report back duplicates but still do insert iff deferred mode. Ignored by HeapController. The real use of the new argument is by IndexChanger which knows about constraints in deferred mode. modified: java/engine/org/apache/derby/impl/sql/compile/CreateIndexNode.java New default arguments (doesn't use the deferred values). modified: java/engine/org/apache/derby/iapi/store/access/RowUtil.java Small formatting fix to #toString. modified: java/engine/org/apache/derby/iapi/store/access/SortObserver.java modified: java/engine/org/apache/derby/impl/sql/execute/BasicSortObserver.java modified: java/engine/org/apache/derby/impl/sql/execute/UniqueIndexSortObserver.java modified: java/engine/org/apache/derby/impl/sql/execute/UniqueWithDuplicateNullsIndexSortObserver.java modified: java/engine/org/apache/derby/impl/store/access/sort/MergeSort.java modified: java/engine/org/apache/derby/impl/store/access/sort/UniqueWithDuplicateNullsMergeSo Changes to accommodate saving duplicates during sorting of existing rows when adding an index for a constraint. modified: java/engine/org/apache/derby/impl/sql/GenericPreparedStatement.java modified: java/engine/org/apache/derby/impl/sql/compile/StaticMethodCallNode.java Call hook for popNestedSessionContext. Renaming of pushNestedSessionContext. modified: java/engine/org/apache/derby/impl/sql/catalog/DataDictionaryImpl.java Removed unnecessary overload of IndexRowGenerator (as part of adding more arguments anyway). modified: java/engine/org/apache/derby/impl/sql/execute/AlterConstraintConstantAction.java modified: java/engine/org/apache/derby/impl/sql/execute/CreateConstraintConstantAction.java Addition of dictionary level check + one bug fix. modified: java/engine/org/apache/derby/impl/sql/execute/CreateIndexConstantAction.java Determine the flags for the index conglomerates to support constraints, see introduction, and set up for creation of correct conglomerates. Handle any initial insertion of rows when constraint mode is initially deferred. modified: java/engine/org/apache/derby/impl/sql/execute/InsertResultSet.java Partial support (preparatory code) for bulk import [type a) and b)] in the presence of deferrable constraints, but see introduction. modified: java/engine/org/apache/derby/impl/sql/execute/SetConstraintsConstantAction.java Logic for setting the session's constraint mode at execution time. modified: java/engine/org/apache/derby/loc/messages.xml modified: java/shared/org/apache/derby/shared/common/reference/SQLState.java New error messages. modified: java/testing/org/apache/derbyTesting/functionTests/tests/lang/ConstraintCharacteristicsTest.java Added test cases "testBasicDeferral" and "testRoutines". modified: java/testing/org/apache/derbyTesting/functionTests/util/T_ConsistencyChecker.java modified: java/testing/org/apache/derbyTesting/unitTests/store/T_AccessFactory.java modified: java/testing/org/apache/derbyTesting/unitTests/store/T_QualifierTest.java modified: java/testing/org/apache/derbyTesting/unitTests/store/T_b2i.java Adjustment of old tests to use deferred=false argument to ConglomerateController#insert modified: java/testing/org/apache/derbyTesting/unitTests/store/T_SortController.java Instantiate sorters with default implementations for the new SortObserver interface methods. modified: java/engine/org/apache/derby/iapi/sql/conn/LanguageConnectionContext.java Import fixes, Javadoc fixes, renaming of pushNestedSessionContext, added new interface method: setDeferred, isEffectivelyDeferred, setDeferredAll, getDeferredHashTables. modified: java/engine/org/apache/derby/impl/sql/conn/GenericLanguageConnectionContext.java Implementations of above. Store "deferredHashTables" for the transaction: the set of disk based hash tables that holds deferred rows for the session, if any. New logic in doCommit and doRollback to handle deferred rows and reset modes. New logic to push constraint modes state on routine invocation in pushNestedSessionContext. Added popNestedSessionContext to validate any deferred rows if we switch back to immediate checking. modified: java/engine/org/apache/derby/iapi/sql/conn/StatementContext.java modified: java/engine/org/apache/derby/iapi/store/access/TransactionController.java modified: java/engine/org/apache/derby/impl/sql/execute/ForeignKeyRIChecker.java modified: java/engine/org/apache/derby/impl/sql/execute/GenericRIChecker.java Javadoc fixes. modified: java/engine/org/apache/derby/impl/sql/execute/BaseActivation.java Javadoc fixes and removal of an assertion that no longer hold with added call in GenericPreparedStatement. modified: java/engine/org/apache/derby/impl/sql/compile/SetConstraintsNode.java Import fix. To apply the patch, you need to add empty versions of the two new files: java/engine/org/apache/derby/impl/sql/execute/DeferredDuplicates.java java/engine/org/apache/derby/impl/store/access/btree/index/B2I_10_4.java
          Hide
          Kathey Marsden added a comment -

          I have not had an opportunity to look at the patch, but have a general question. For XA will the deferred constraints be enforced at prepare time?

          Show
          Kathey Marsden added a comment - I have not had an opportunity to look at the patch, but have a general question. For XA will the deferred constraints be enforced at prepare time?
          Hide
          Dag H. Wanvik added a comment -

          Kathey, I haven't looked at XA semantics yet, should it be enforced at prepare to commit time?

          Show
          Dag H. Wanvik added a comment - Kathey, I haven't looked at XA semantics yet, should it be enforced at prepare to commit time?
          Hide
          Kathey Marsden added a comment -

          That makes sense to me as I believe once prepared, nothing should interfere with the commit, but I say that just from what seems logical, not having done any spec or product research on the matter.

          Show
          Kathey Marsden added a comment - That makes sense to me as I believe once prepared, nothing should interfere with the commit, but I say that just from what seems logical, not having done any spec or product research on the matter.
          Hide
          Dag H. Wanvik added a comment -

          Yes, I think it makes sense, too. I'll look around a bit to see what others do..

          Show
          Dag H. Wanvik added a comment - Yes, I think it makes sense, too. I'll look around a bit to see what others do..
          Hide
          Dag H. Wanvik added a comment -

          Uploading a new version of the specification, see its latest rev. note (1.5).

          Show
          Dag H. Wanvik added a comment - Uploading a new version of the specification, see its latest rev. note (1.5).
          Hide
          Dag H. Wanvik added a comment -

          Uploading a second version, derby-532-unique-pk-2, which is essentially the same, just a refresh. I had to adjust some constants due to an intervening commit (see RegisteredFormatIds and StoredFormatIds) from the MERGE work.

          Uploading a new patch, derby-532-xa-1 (which builds upon pk-2). This adds support for checking deferred constraints in xa_prepare and xa_commit (.., true), cf. specification, and adds test cases.

          Show
          Dag H. Wanvik added a comment - Uploading a second version, derby-532-unique-pk-2, which is essentially the same, just a refresh. I had to adjust some constants due to an intervening commit (see RegisteredFormatIds and StoredFormatIds) from the MERGE work. Uploading a new patch, derby-532-xa-1 (which builds upon pk-2). This adds support for checking deferred constraints in xa_prepare and xa_commit (.., true), cf. specification, and adds test cases.
          Hide
          Dag H. Wanvik added a comment -

          As far as precedent for XA, I haven't been able to find much, but I did find that the Informix documentation mentions throwing XAException#XA_RBINTEGRITY for unsatisfied deferred constraints in the API docs for both xa_prepare() and xa_commit(). I had chosen the same exception, so it seems that's right. The case of xa_commit applies to the possibility for commiting 1PC transactions [i.e. xa_commit(xid, true)], which do not require xa_prepare.

          Show
          Dag H. Wanvik added a comment - As far as precedent for XA, I haven't been able to find much, but I did find that the Informix documentation mentions throwing XAException#XA_RBINTEGRITY for unsatisfied deferred constraints in the API docs for both xa_prepare() and xa_commit(). I had chosen the same exception, so it seems that's right. The case of xa_commit applies to the possibility for commiting 1PC transactions [i.e. xa_commit(xid, true)] , which do not require xa_prepare.
          Hide
          Dag H. Wanvik added a comment -

          derby-532-xa-2 uploaded (javadoc changes only relative to version 1).

          Show
          Dag H. Wanvik added a comment - derby-532-xa-2 uploaded (javadoc changes only relative to version 1).
          Hide
          Dag H. Wanvik added a comment -

          Attaching another patch, derby-532-import-1, which adds support for deferred constraint in connection with bulk insert, e.g. by import and adds two test cases, ConstraintCharacteristicsTest#testImport and #testDeferredRowsInvalidation.

          The patch builds on the last rev of the previous two patches for this issue.

          It should also fix DERBY-6374 and adds a test case for it, ConstraintCharacteristicsTest#testDerby6374.

          Running regressions.

          Show
          Dag H. Wanvik added a comment - Attaching another patch, derby-532-import-1, which adds support for deferred constraint in connection with bulk insert, e.g. by import and adds two test cases, ConstraintCharacteristicsTest#testImport and #testDeferredRowsInvalidation. The patch builds on the last rev of the previous two patches for this issue. It should also fix DERBY-6374 and adds a test case for it, ConstraintCharacteristicsTest#testDerby6374. Running regressions.
          Hide
          Dag H. Wanvik added a comment -

          Uploading version 2, derby-532-import-2 (minor clerical error).

          Show
          Dag H. Wanvik added a comment - Uploading version 2, derby-532-import-2 (minor clerical error).
          Hide
          Mike Matrigali added a comment -

          i totally missed that this project was implementing yet another btree type, so sorry for late review. I am still
          looking at detail, but wondering if you ever considered any solution that did not require special case code
          in the btree. I would like to understand what would be the problem with implementing deferred constraints
          using existing "duplicate" btrees with no changes, and then do all deferred contraint checking in the sql
          layer doing probes to ascertain constraint consistency using existing access interfaces (with maybe something
          more specialized if that allows for faster path).

          I know you are following a path that the "unique" with duplicate code followed. I have regretted that approach
          since it was implemented. The project resulted in many bugs because of the complication involved in twisting
          the code to do something that it really did not want to do.

          Show
          Mike Matrigali added a comment - i totally missed that this project was implementing yet another btree type, so sorry for late review. I am still looking at detail, but wondering if you ever considered any solution that did not require special case code in the btree. I would like to understand what would be the problem with implementing deferred constraints using existing "duplicate" btrees with no changes, and then do all deferred contraint checking in the sql layer doing probes to ascertain constraint consistency using existing access interfaces (with maybe something more specialized if that allows for faster path). I know you are following a path that the "unique" with duplicate code followed. I have regretted that approach since it was implemented. The project resulted in many bugs because of the complication involved in twisting the code to do something that it really did not want to do.
          Hide
          Mike Matrigali added a comment -

          if the new btree approach is checked in, should add tests to look out for bugs like DERBY-3502 which
          came about the last time a new, but similar btree was checked in.

          Show
          Mike Matrigali added a comment - if the new btree approach is checked in, should add tests to look out for bugs like DERBY-3502 which came about the last time a new, but similar btree was checked in.
          Hide
          Mike Matrigali added a comment -

          can anyone speak to whether deferred constraint checking is in anyway similar to deferred updates already supported in the sql layer of the derby code. Is there anyway to use that existing code to batch up a set of
          changes and run deferred constraint checking at given time?

          Show
          Mike Matrigali added a comment - can anyone speak to whether deferred constraint checking is in anyway similar to deferred updates already supported in the sql layer of the derby code. Is there anyway to use that existing code to batch up a set of changes and run deferred constraint checking at given time?
          Hide
          Mike Matrigali added a comment -

          It looks like a lot of the fast path code paths now have added code to deal with differed update. Should
          code for instance in InsertResultSet.java be broken out and there to be one routine to handle deferred constraint
          and one to not. With some sort of inheritance to handle it?

          Show
          Mike Matrigali added a comment - It looks like a lot of the fast path code paths now have added code to deal with differed update. Should code for instance in InsertResultSet.java be broken out and there to be one routine to handle deferred constraint and one to not. With some sort of inheritance to handle it?
          Hide
          Mike Matrigali added a comment -

          should there be changes to update code in addition to insert code, or is this one of the benefits of doing this work at lowest level where all updates on btree's are turned into deletes and inserts?

          Show
          Mike Matrigali added a comment - should there be changes to update code in addition to insert code, or is this one of the benefits of doing this work at lowest level where all updates on btree's are turned into deletes and inserts?
          Hide
          Rick Hillegas added a comment -

          Hi Mike,

          I don't see much overlap between "deferred updates" and "deferred constraints". Deferred updates are meant to untangle the problem of a single statement writing to the same data container that it's reading. Deferred constraints are meant to handle the problem of complex, multi-statement updates which may make your data inconsistent temporarily. I don't see how the "deferred update" machinery, which is scoped to a single statement, can be pressed into service to handle the multi-statement problem of "deferred constraints." Thanks.

          Show
          Rick Hillegas added a comment - Hi Mike, I don't see much overlap between "deferred updates" and "deferred constraints". Deferred updates are meant to untangle the problem of a single statement writing to the same data container that it's reading. Deferred constraints are meant to handle the problem of complex, multi-statement updates which may make your data inconsistent temporarily. I don't see how the "deferred update" machinery, which is scoped to a single statement, can be pressed into service to handle the multi-statement problem of "deferred constraints." Thanks.
          Hide
          Mike Matrigali added a comment -

          deferred to end transaction seems like definitely the hardest part of deferred constraints. As an incremental strategy did you consider doing just the "alter table" level part first. I wonder how many customers would be helped by
          just being able to turn off constraint checking initially and enabling later and are ok with taking the hit of dropping and recreating indexes for this at "check" time.

          Show
          Mike Matrigali added a comment - deferred to end transaction seems like definitely the hardest part of deferred constraints. As an incremental strategy did you consider doing just the "alter table" level part first. I wonder how many customers would be helped by just being able to turn off constraint checking initially and enabling later and are ok with taking the hit of dropping and recreating indexes for this at "check" time.
          Hide
          Mike Matrigali added a comment -

          I am trying to understand the locking and isolation implications of doing the "duplicate" checking in the non-duplicate new btrees at insert time. And what is the right thing to do with rows marked deleted. I think doing this check at
          insert time is going to add unintended problems with either isolation or locking. I think the insert time check for duplicates is really a performance thing to narrow down the number of deferred rows to keep track of, and would like to see how bad a simpler strategy would perform which then could be optimized later with a more complicated btree implementation later if necessary. Would this type of implementation also match up well with whatever we would have to do for deferred foreign key constraints?

          We have always made the indexes created for constraint checking available to the optimizer to pull rows out of
          in the past. Should these new deferred constraint indexes also be available, or is there anything special we can
          do with these to make deferred update implementation easier. I looked at some public documents and did not
          get any specific info on how other db's do this, but got the feeling that these might be treated differently in some
          way vs. other indexes. We should be careful on implementation to make sure stuff like sort avoidance for
          uniqueness works correctly with these.

          If the deferred unique constraint indexes were not made available to the optimizer to satisfy query results from (and thus the rows could be
          out of date until end of transaction), then another option would
          be to define the unique constraints exactly as they are today in the store - but somehow mark them as not
          available to optimizer. The obvious problem with them is that the inserting transaction should see rows it has
          inserted and won't. And just defer the updating of this
          index until end of transaction, driving the updates from a deferred list maintained by the sql layer. In this case
          all isolation level locking would be unchanged (just delayed) and no possible unexpecting locking differences
          between a deferred and non-deferred constraint.

          Could the SQL layer during deferred mode keep track of every insert into the "duplicate" constraint index and then do
          the constraint check at commit time itself. At end transaction time I think it is clear that every row that is looked at to do the duplicate check needs to be first locked and waited on and then checked, with locks released according to isolation
          level standards. This would include rows marked deleted, which would be locked to make sure that are not part
          of other transactions that have not committed yet. This check could be coded as a scan for the key and not 2 rows or it would be pretty clean to add a special interface to return true/false for more than one row match for a given key description.

          Show
          Mike Matrigali added a comment - I am trying to understand the locking and isolation implications of doing the "duplicate" checking in the non-duplicate new btrees at insert time. And what is the right thing to do with rows marked deleted. I think doing this check at insert time is going to add unintended problems with either isolation or locking. I think the insert time check for duplicates is really a performance thing to narrow down the number of deferred rows to keep track of, and would like to see how bad a simpler strategy would perform which then could be optimized later with a more complicated btree implementation later if necessary. Would this type of implementation also match up well with whatever we would have to do for deferred foreign key constraints? We have always made the indexes created for constraint checking available to the optimizer to pull rows out of in the past. Should these new deferred constraint indexes also be available, or is there anything special we can do with these to make deferred update implementation easier. I looked at some public documents and did not get any specific info on how other db's do this, but got the feeling that these might be treated differently in some way vs. other indexes. We should be careful on implementation to make sure stuff like sort avoidance for uniqueness works correctly with these. If the deferred unique constraint indexes were not made available to the optimizer to satisfy query results from (and thus the rows could be out of date until end of transaction), then another option would be to define the unique constraints exactly as they are today in the store - but somehow mark them as not available to optimizer. The obvious problem with them is that the inserting transaction should see rows it has inserted and won't. And just defer the updating of this index until end of transaction, driving the updates from a deferred list maintained by the sql layer. In this case all isolation level locking would be unchanged (just delayed) and no possible unexpecting locking differences between a deferred and non-deferred constraint. Could the SQL layer during deferred mode keep track of every insert into the "duplicate" constraint index and then do the constraint check at commit time itself. At end transaction time I think it is clear that every row that is looked at to do the duplicate check needs to be first locked and waited on and then checked, with locks released according to isolation level standards. This would include rows marked deleted, which would be locked to make sure that are not part of other transactions that have not committed yet. This check could be coded as a scan for the key and not 2 rows or it would be pretty clean to add a special interface to return true/false for more than one row match for a given key description.
          Hide
          Dag H. Wanvik added a comment - - edited

          Thanks a lot, Mike! I have answered inlined below. I might not have understood all your point, please bear with me, and don't hesitate to ask again/elaborate on unclear issues.

          > I would like to understand what would be the problem with
          > implementing deferred constraints using existing "duplicate" BTree
          > with no changes, and then do all deferred constraint checking in the
          > sql layer doing probes to ascertain constraint consistency using
          > existing access interfaces (with maybe something more specialized if
          > that allows for faster path).

          Not sure if I understand you correctly there, but the approach taken is essentially to use the existing "duplicate" BTree approach implemented for nullable unique constraints/indexes, but with a twist:

          • when we see a duplicate on insert (this is already being checked, nothing new there), remember it
          • for primary key and unique not null constraints which are deferrable (and only those: not deferrable constraints/indexes are unchanged except for a couple of extra tests, see below), we now use the "duplicate" BTree approach also

          Somehow, we do need to intercept the current duplicate checks when BTree inserts happen, so they don't throw and data does get inserted: otherwise we'd need to store duplicate rows somewhere else and this would make operation later in the transaction difficult: for example a query would need to look in two places for data, I didn't want to go there...

          > I know you are following a path that the "unique" with duplicate code
          > followed. I have regretted that approach since it was implemented. The
          > project resulted in many bugs because of the complication involved in
          > twisting the code to do something that it really did not want to do.

          Yes I am aware of that; I did find one extra bug with this approach, which I have fixed as part of this work: DERBY-6374. If/when we redesign this approach, I hope we could carry the implementation for deferred constraints along somehow.

          > if the new BTree approach is checked in, should add tests to look out
          > for bugs like DERBY-3502 which came about the last time a new, but
          > similar BTree was checked in.

          > It looks like a lot of the fast path code paths now have added code to
          > deal with differed update. Should code for instance in
          > InsertResultSet.java be broken out and there to be one routine to
          > handle deferred constraint and one to not. With some sort of
          > inheritance to handle it?

          As far as the existing fast code path (for not deferrable constraints and for indexes which are not "nullable unique") is concerned, the new code adds the following extra overhead:

          * IndexChanger#insertAndCheckDups: one extra simple boolean test:
            
                "!deferrable"
          
          * BTreeController#doIns: the existing two tests for
          
                btree.isUniqueWithDuplicateNulls()
          
            now has an extra test:
          
                btree.isUniqueWithDuplicateNulls() ||
                btree.hasDeferrableChecking()
          
          This could be optimized to just use the booleans directly thus:
          
               btree.uniqueWithDuplicateNulls ||
              btree.uniqueDeferrable
          

          so, in total two extra simple boolean checks per insert, which isn't bad I think. I am sure we could factor this out into more object oriented fashion, but I'd worry that would entail much redundancy and/or extra method calls for the current fast path.

          > should there be changes to update code in addition to insert code, or
          > is this one of the benefits of doing this work at lowest level where
          > all updates on BTree's are turned into deletes and inserts?

          Yes, we get that for free.

          > deferred to end transaction seems like definitely the hardest part of
          > deferred constraints. As an incremental strategy did you consider
          > doing just the "alter table" level part first. I wonder how many
          > customers would be helped by just being able to turn off constraint
          > checking initially and enabling later and are ok with taking the hit
          > of dropping and recreating indexes for this at "check" time.

          Yes I did consider it, but we have use cases that can benefit from the standard SQL feature.

          > I am trying to understand the locking and isolation implications of
          > doing the "duplicate" checking in the non-duplicate new btrees at
          > insert time.

          Again, not sure if I understand 100% what you're asking, but: I tried to make insertion of duplicate rows work with non-duplicate B-tree, but I had to abandon that approach. Cf. above: for deferred constraints we always use the approach of the duplicate "nullable unique" B-trees. Using that approach for deferrable PRIMARY KEY and UNIQUE "not null" constraints, will shows locking behavior similar to the current "nullable unique" implementation, I should think.

          Duplicate rows would keep their row X locks till transaction end since as far as the index is concerned, they are all unique (in that the key includes the row location of the base row). As far as isolation, I think only other transaction that use "read uncommitted" could ever see the duplicates.

          > And what is the right thing to do with rows marked deleted.

          Yes, that's what I hit problems with

          > I think doing this check at insert time is going to add unintended
          > problems with either isolation or locking.

          Given the above, not sure I understand how. Could you explain what would be the concern here? Again, if the nullable unique indexes are OK as far as isolation and locking, I think the deferred variants should be ok too?

          > I think the insert time check for duplicates is really a
          > performance thing to narrow down the number of deferred rows to keep
          > track of,

          Well, today, the inserts will fail due to the duplicate checks in the B-tree, so we'd need to do something to intercept this in any case. Can you sketch is some more detail what you have in mind?

          > perform which then could be optimized later with a more complicated
          > btree implementation later if necessary. Would this type of
          > implementation also match up well with whatever we would have to do
          > for deferred foreign key constraints?
          >
          > We have always made the indexes created for constraint checking
          > available to the optimizer to pull rows out of in the past. Should
          > these new deferred constraint indexes also be available, or is there
          > anything special we can do with these to make deferred update
          > implementation easier. I looked at some public documents and did not
          > get any specific info on how other db's do this, but got the feeling
          > that these might be treated differently in some way vs. other
          > indexes. We should be careful on implementation to make sure stuff
          > like sort avoidance for uniqueness works correctly with these.

          I made sure that the deferred indexes would all answer "false" to the isUnique() method which the optimizer consults, so I think this would bar the optimizer from presuming (wrongly) that there is only one row. Again, this might make some queries on deferrable indexes less performant, but that's the price one has to pay to have deferred constraints. The existing non deferrable unique indexes/constraint will not be impacted.

          > If the deferred unique constraint indexes were not made available to
          > the optimizer to satisfy query results from (and thus the rows could
          > be out of date until end of transaction), then another option would be
          > to define the unique constraints exactly as they are today in the
          > store - but somehow mark them as not available to optimizer.

          Not sure I understand: meaning that the queries couldn't use the index at all until transaction end? I think that would perform worse than what I currently do: offer a non-unique index?

          > The obvious problem with them is that the inserting transaction
          > should see rows it has inserted and won't. And just defer the
          > updating of this index until end of transaction, driving the updates
          > from a deferred list maintained by the sql layer.

          In the current solution, the inserting transaction does see the duplicate rows.

          > In this case all isolation level locking would be unchanged (just
          > delayed) and no possible unexpected locking differences between a
          > deferred and non-deferred constraint.

          The deferred case would retain locks on more rows till transaction end, which is to be expected. I think it is acceptable that deferrable constraints could show slightly different locking behavior that non deferrable constraints in any case. But again, I might not understand exactly what you are driving at here...

          > Could the SQL layer during deferred mode keep track of every insert
          > into the "duplicate" constraint index and then do the constraint
          > check at commit time itself.

          Instead of just keeping track of the duplicates, you mean? Again, we'd need to make sure the inserts would succeed somehow. We do perform the constraint check at commit time in the present solution...

          > At end transaction time I think it is clear that every row that is
          > looked at to do the duplicate check needs to be first locked and
          > waited on and then checked, with locks released according to
          > isolation level standards.

          Agreed. The present solution may not be correct here, I'll take another look: The B-tree scan I use at commit time uses ISOLATION_READ_COMMITTED_NOHOLDLOCK to check for the duplicates: the transaction already holds X locks to the rows it inserted, but it would need read locks on any other row with the same key. But presumably, for repeatable ready/serializable, we'd already have those locks too?

          > This would include rows marked deleted, which would be locked to
          > make sure that are not part of other transactions that have not
          > committed yet. This check could be coded as a scan for the key

          That's what I do I think? Cf. DeferredDuplicates#validate

          > and not 2 rows or it would be pretty clean to add a special
          > interface to return true/false for more than one row match for a
          > given key description.

          Show
          Dag H. Wanvik added a comment - - edited Thanks a lot, Mike! I have answered inlined below. I might not have understood all your point, please bear with me, and don't hesitate to ask again/elaborate on unclear issues. > I would like to understand what would be the problem with > implementing deferred constraints using existing "duplicate" BTree > with no changes, and then do all deferred constraint checking in the > sql layer doing probes to ascertain constraint consistency using > existing access interfaces (with maybe something more specialized if > that allows for faster path). Not sure if I understand you correctly there, but the approach taken is essentially to use the existing "duplicate" BTree approach implemented for nullable unique constraints/indexes, but with a twist: when we see a duplicate on insert (this is already being checked, nothing new there), remember it for primary key and unique not null constraints which are deferrable (and only those: not deferrable constraints/indexes are unchanged except for a couple of extra tests, see below), we now use the "duplicate" BTree approach also Somehow, we do need to intercept the current duplicate checks when BTree inserts happen, so they don't throw and data does get inserted: otherwise we'd need to store duplicate rows somewhere else and this would make operation later in the transaction difficult: for example a query would need to look in two places for data, I didn't want to go there... > I know you are following a path that the "unique" with duplicate code > followed. I have regretted that approach since it was implemented. The > project resulted in many bugs because of the complication involved in > twisting the code to do something that it really did not want to do. Yes I am aware of that; I did find one extra bug with this approach, which I have fixed as part of this work: DERBY-6374 . If/when we redesign this approach, I hope we could carry the implementation for deferred constraints along somehow. > if the new BTree approach is checked in, should add tests to look out > for bugs like DERBY-3502 which came about the last time a new, but > similar BTree was checked in. > It looks like a lot of the fast path code paths now have added code to > deal with differed update. Should code for instance in > InsertResultSet.java be broken out and there to be one routine to > handle deferred constraint and one to not. With some sort of > inheritance to handle it? As far as the existing fast code path (for not deferrable constraints and for indexes which are not "nullable unique") is concerned, the new code adds the following extra overhead: * IndexChanger#insertAndCheckDups: one extra simple boolean test: "!deferrable" * BTreeController#doIns: the existing two tests for btree.isUniqueWithDuplicateNulls() now has an extra test: btree.isUniqueWithDuplicateNulls() || btree.hasDeferrableChecking() This could be optimized to just use the booleans directly thus: btree.uniqueWithDuplicateNulls || btree.uniqueDeferrable so, in total two extra simple boolean checks per insert, which isn't bad I think. I am sure we could factor this out into more object oriented fashion, but I'd worry that would entail much redundancy and/or extra method calls for the current fast path. > should there be changes to update code in addition to insert code, or > is this one of the benefits of doing this work at lowest level where > all updates on BTree's are turned into deletes and inserts? Yes, we get that for free. > deferred to end transaction seems like definitely the hardest part of > deferred constraints. As an incremental strategy did you consider > doing just the "alter table" level part first. I wonder how many > customers would be helped by just being able to turn off constraint > checking initially and enabling later and are ok with taking the hit > of dropping and recreating indexes for this at "check" time. Yes I did consider it, but we have use cases that can benefit from the standard SQL feature. > I am trying to understand the locking and isolation implications of > doing the "duplicate" checking in the non-duplicate new btrees at > insert time. Again, not sure if I understand 100% what you're asking, but: I tried to make insertion of duplicate rows work with non-duplicate B-tree, but I had to abandon that approach. Cf. above: for deferred constraints we always use the approach of the duplicate "nullable unique" B-trees. Using that approach for deferrable PRIMARY KEY and UNIQUE "not null" constraints, will shows locking behavior similar to the current "nullable unique" implementation, I should think. Duplicate rows would keep their row X locks till transaction end since as far as the index is concerned, they are all unique (in that the key includes the row location of the base row). As far as isolation, I think only other transaction that use "read uncommitted" could ever see the duplicates. > And what is the right thing to do with rows marked deleted. Yes, that's what I hit problems with > I think doing this check at insert time is going to add unintended > problems with either isolation or locking. Given the above, not sure I understand how. Could you explain what would be the concern here? Again, if the nullable unique indexes are OK as far as isolation and locking, I think the deferred variants should be ok too? > I think the insert time check for duplicates is really a > performance thing to narrow down the number of deferred rows to keep > track of, Well, today, the inserts will fail due to the duplicate checks in the B-tree, so we'd need to do something to intercept this in any case. Can you sketch is some more detail what you have in mind? > perform which then could be optimized later with a more complicated > btree implementation later if necessary. Would this type of > implementation also match up well with whatever we would have to do > for deferred foreign key constraints? > > We have always made the indexes created for constraint checking > available to the optimizer to pull rows out of in the past. Should > these new deferred constraint indexes also be available, or is there > anything special we can do with these to make deferred update > implementation easier. I looked at some public documents and did not > get any specific info on how other db's do this, but got the feeling > that these might be treated differently in some way vs. other > indexes. We should be careful on implementation to make sure stuff > like sort avoidance for uniqueness works correctly with these. I made sure that the deferred indexes would all answer "false" to the isUnique() method which the optimizer consults, so I think this would bar the optimizer from presuming (wrongly) that there is only one row. Again, this might make some queries on deferrable indexes less performant, but that's the price one has to pay to have deferred constraints. The existing non deferrable unique indexes/constraint will not be impacted. > If the deferred unique constraint indexes were not made available to > the optimizer to satisfy query results from (and thus the rows could > be out of date until end of transaction), then another option would be > to define the unique constraints exactly as they are today in the > store - but somehow mark them as not available to optimizer. Not sure I understand: meaning that the queries couldn't use the index at all until transaction end? I think that would perform worse than what I currently do: offer a non-unique index? > The obvious problem with them is that the inserting transaction > should see rows it has inserted and won't. And just defer the > updating of this index until end of transaction, driving the updates > from a deferred list maintained by the sql layer. In the current solution, the inserting transaction does see the duplicate rows. > In this case all isolation level locking would be unchanged (just > delayed) and no possible unexpected locking differences between a > deferred and non-deferred constraint. The deferred case would retain locks on more rows till transaction end, which is to be expected. I think it is acceptable that deferrable constraints could show slightly different locking behavior that non deferrable constraints in any case. But again, I might not understand exactly what you are driving at here... > Could the SQL layer during deferred mode keep track of every insert > into the "duplicate" constraint index and then do the constraint > check at commit time itself. Instead of just keeping track of the duplicates, you mean? Again, we'd need to make sure the inserts would succeed somehow. We do perform the constraint check at commit time in the present solution... > At end transaction time I think it is clear that every row that is > looked at to do the duplicate check needs to be first locked and > waited on and then checked, with locks released according to > isolation level standards. Agreed. The present solution may not be correct here, I'll take another look: The B-tree scan I use at commit time uses ISOLATION_READ_COMMITTED_NOHOLDLOCK to check for the duplicates: the transaction already holds X locks to the rows it inserted, but it would need read locks on any other row with the same key. But presumably, for repeatable ready/serializable, we'd already have those locks too? > This would include rows marked deleted, which would be locked to > make sure that are not part of other transactions that have not > committed yet. This check could be coded as a scan for the key That's what I do I think? Cf. DeferredDuplicates#validate > and not 2 rows or it would be pretty clean to add a special > interface to return true/false for more than one row match for a > given key description.
          Hide
          Dag H. Wanvik added a comment -

          I forgot to add:

          >if the new btree approach is checked in, should add tests to look out
          > for bugs like DERBY-3502 which came about the last time a new, but
          > similar btree was checked in.

          The new backing indexes for deferrable constraints are not sharable for now, so we should avoid this class of errors.

          Show
          Dag H. Wanvik added a comment - I forgot to add: >if the new btree approach is checked in, should add tests to look out > for bugs like DERBY-3502 which came about the last time a new, but > similar btree was checked in. The new backing indexes for deferrable constraints are not sharable for now, so we should avoid this class of errors.
          Hide
          Dag H. Wanvik added a comment -

          Uploading refreshed versions of the three uncommitted (apply in sequential order to build):

          derby-532-unique-pk-3.diff
          derby-532-xa-3.diff
          derby-532-import-3.diff
          

          Also attaching a new patch, derby-532-more-tests-1, which

          adds more tests and changes the representation of the duplicates we store away in the hash table, so omit a unique counter and the row location of th eoriginal row, since none of those are actually used, so the has table can be reduced in size - we do not need to store all duplicates, just the key (once).

          Apply on top of the three first ones, running regressions.

          Show
          Dag H. Wanvik added a comment - Uploading refreshed versions of the three uncommitted (apply in sequential order to build): derby-532-unique-pk-3.diff derby-532-xa-3.diff derby-532-import-3.diff Also attaching a new patch, derby-532-more-tests-1, which adds more tests and changes the representation of the duplicates we store away in the hash table, so omit a unique counter and the row location of th eoriginal row, since none of those are actually used, so the has table can be reduced in size - we do not need to store all duplicates, just the key (once). Apply on top of the three first ones, running regressions.
          Hide
          Mike Matrigali added a comment -

          Thanks Dag for the reply. Below is hopefully a clearer explanation
          of how I think it should work.

          I think we should implement all deferrable unique constraints using
          existing unmodified non-unique btree implementation. At least as the
          first incremental approach I believe this should function correctly and
          work with your design.

          Constraint checking architecturally seems like something that should happen
          at the SQL layer. I know the system does existing uniqueness checks in store
          code, but as we have seen as this constraint checking gets more convoluted at
          least to me it seems best managed closer to the SQL level that understands
          the datatypes, the nullability rules, and in this case the level that
          understands the timing on when that checking is meant to be done. It looks
          like Oracle implements their deferred unique checking using a non-unique
          index also.

          Using a pure duplicate index matches the reality of deffered unique
          constraint. At any time when the constraint is deferred there may
          be duplicate keys seen by the inserting transaction during query
          execution. So optimizer and execution should not assume index is unique.
          Having no special casing in the btree I think will lead to less bugs
          overall, as it should force all code to treat the index as it really is: ie.
          a non-unique index.

          o constraint definition time:
          o create a pure duplicate allowing index matching the constraint. It is
          important that the catalog for this that is used by the optimizer and
          execution knows that it is non-unique, even though it will be used to
          implement unique constraints.

          o sql layer insert/update time deferable constraint (constraint deferred):

          o just go ahead and insert the row into the duplicate allowing index. There
          should never be a unique error thrown. Do not have any code at this point
          that tries to determine anything about the constraint.

          It sounded like you were seeing unique errors at this point initially,
          which I don't understand.

          At SQL layer add all inserts into the btree into a backing store hash
          table.

          As you mentioned this approach avoids having any sort of shadow table,
          the index is always up to date. Locking prevents other transactions
          from seeing the possible duplicates, and I assume SQL standard says it
          is ok to see these duplicates from same transaction.
          o sql layer insert/update time deferrable constraint (constraint not deferred)

          o after insert run same duplicate check you would have run if it were
          deferred, just after the insert. If it fails thow error and backout
          from SQL layer.

          o sql layer commit time with outstanding constraint checking

          o now simply run through entire list doing constraint checking. This
          is actually less complicated from a locking perspective since a probe
          into the tree using just the key will always go to the left-most
          matching key and the locking will be guaranteed left to right which
          avoids

          o by using backing store list you won't run out of memory for the to
          be processed constraint list. This is apparently was a problem in
          one google found posting about postgres.

          o I think this step is one area where once the system is running I would
          suggest a follow on patch to add some sort of
          "reopenScanAndCheckForDuplicate()"
          interface to GenericScanController, if performance looks critical.
          It could package positioning the scan and checking for duplicates in
          one call and minimize latch contention.

          For me the benefits of this approach are:
          1) move any "extra" locking necessary to verify duplicate in case of deferred
          checking to end of transaction, so less chance of unexpected concurrency
          during middle of long running transaction.

          2) Seems simpler to prove correct. In the currently proposed solution the
          constraint becomes corrupt if for any reason the first check on inserting
          into the duplicate btree misses a possible duplicate.
          There a number of subtle problems with
          checking "left and right" during insert into a non-unique btree for
          duplicates. The code may be correct but testing is very hard and the
          cases include combinations of all the following:
          o committed deleted rows of other transactions and self transactions
          o non-committed deleted rows of other transactions and self transactions
          o when going left need to lose latch and "restart" all work
          o aborted other transactions after the check
          o concurrent other transactions doing same insert at same time
          o does the existing locking guarantee that the 2nd inserter always does
          the final deferred check, and the 1st one does not have to do a check.
          Should we be forcing that order of check, or is it more concurrent to
          just do the check by whoever is ending transaction first?

          3) I think this same architecture should allow for one shared implementation
          to be also used by deferred foreign key constraints. Would like some
          feedback if this makes sense or not. I don't know much about existing
          foreign key constraints or how/if we need to do work to make them deferrable.
          ~

          Show
          Mike Matrigali added a comment - Thanks Dag for the reply. Below is hopefully a clearer explanation of how I think it should work. I think we should implement all deferrable unique constraints using existing unmodified non-unique btree implementation. At least as the first incremental approach I believe this should function correctly and work with your design. Constraint checking architecturally seems like something that should happen at the SQL layer. I know the system does existing uniqueness checks in store code, but as we have seen as this constraint checking gets more convoluted at least to me it seems best managed closer to the SQL level that understands the datatypes, the nullability rules, and in this case the level that understands the timing on when that checking is meant to be done. It looks like Oracle implements their deferred unique checking using a non-unique index also. Using a pure duplicate index matches the reality of deffered unique constraint. At any time when the constraint is deferred there may be duplicate keys seen by the inserting transaction during query execution. So optimizer and execution should not assume index is unique. Having no special casing in the btree I think will lead to less bugs overall, as it should force all code to treat the index as it really is: ie. a non-unique index. o constraint definition time: o create a pure duplicate allowing index matching the constraint. It is important that the catalog for this that is used by the optimizer and execution knows that it is non-unique, even though it will be used to implement unique constraints. o sql layer insert/update time deferable constraint (constraint deferred): o just go ahead and insert the row into the duplicate allowing index. There should never be a unique error thrown. Do not have any code at this point that tries to determine anything about the constraint. It sounded like you were seeing unique errors at this point initially, which I don't understand. At SQL layer add all inserts into the btree into a backing store hash table. As you mentioned this approach avoids having any sort of shadow table, the index is always up to date. Locking prevents other transactions from seeing the possible duplicates, and I assume SQL standard says it is ok to see these duplicates from same transaction. o sql layer insert/update time deferrable constraint (constraint not deferred) o after insert run same duplicate check you would have run if it were deferred, just after the insert. If it fails thow error and backout from SQL layer. o sql layer commit time with outstanding constraint checking o now simply run through entire list doing constraint checking. This is actually less complicated from a locking perspective since a probe into the tree using just the key will always go to the left-most matching key and the locking will be guaranteed left to right which avoids o by using backing store list you won't run out of memory for the to be processed constraint list. This is apparently was a problem in one google found posting about postgres. o I think this step is one area where once the system is running I would suggest a follow on patch to add some sort of "reopenScanAndCheckForDuplicate()" interface to GenericScanController, if performance looks critical. It could package positioning the scan and checking for duplicates in one call and minimize latch contention. For me the benefits of this approach are: 1) move any "extra" locking necessary to verify duplicate in case of deferred checking to end of transaction, so less chance of unexpected concurrency during middle of long running transaction. 2) Seems simpler to prove correct. In the currently proposed solution the constraint becomes corrupt if for any reason the first check on inserting into the duplicate btree misses a possible duplicate. There a number of subtle problems with checking "left and right" during insert into a non-unique btree for duplicates. The code may be correct but testing is very hard and the cases include combinations of all the following: o committed deleted rows of other transactions and self transactions o non-committed deleted rows of other transactions and self transactions o when going left need to lose latch and "restart" all work o aborted other transactions after the check o concurrent other transactions doing same insert at same time o does the existing locking guarantee that the 2nd inserter always does the final deferred check, and the 1st one does not have to do a check. Should we be forcing that order of check, or is it more concurrent to just do the check by whoever is ending transaction first? 3) I think this same architecture should allow for one shared implementation to be also used by deferred foreign key constraints. Would like some feedback if this makes sense or not. I don't know much about existing foreign key constraints or how/if we need to do work to make them deferrable. ~
          Hide
          Dag H. Wanvik added a comment -

          Thanks, Mike. Your proposal makes sense to me and I agree on your architectural consideration that as far as keeping the store out of the language considerations here - I believe what you propose should work, but would be slightly less performant that what I currently do, in that we would be storing away (and checking at commit time) all inserted keys, not only those that are duplicates. But I agree it would be safer if we don't fully trust the current "left/right" checking, and also simpler, as far as reducing special code in the store. I'll have a closer look at such an approach and report back.

          Also with such a solution, at commit time two concurrent transactions both inserting duplicates would both be running checks. Could that give rise to dead-locks I wonder? Could both have a row level X write lock on a duplicate in addition to an existing (left-most) unlocked existing row, say? Or can we latch the X locked row in the BTree to detect a duplicate anyway?)

          As far as foreign constraints are concerned, I haven't looked closely at those yet, but my hunch is that we could reuse the mechanism developed for pk/unique.

          Show
          Dag H. Wanvik added a comment - Thanks, Mike. Your proposal makes sense to me and I agree on your architectural consideration that as far as keeping the store out of the language considerations here - I believe what you propose should work, but would be slightly less performant that what I currently do, in that we would be storing away (and checking at commit time) all inserted keys, not only those that are duplicates. But I agree it would be safer if we don't fully trust the current "left/right" checking, and also simpler, as far as reducing special code in the store. I'll have a closer look at such an approach and report back. Also with such a solution, at commit time two concurrent transactions both inserting duplicates would both be running checks. Could that give rise to dead-locks I wonder? Could both have a row level X write lock on a duplicate in addition to an existing (left-most) unlocked existing row, say? Or can we latch the X locked row in the BTree to detect a duplicate anyway?) As far as foreign constraints are concerned, I haven't looked closely at those yet, but my hunch is that we could reuse the mechanism developed for pk/unique.
          Hide
          Mike Matrigali added a comment -

          >
          > Also with such a solution, at commit time two concurrent transactions both
          > inserting duplicates would both be running checks. Could that give rise to
          > dead-locks I wonder? Could both have a row level X write lock on a duplicate
          > in addition to an existing (left-most) unlocked existing row, say? Or can we
          > latch the X locked row in the BTree to detect a duplicate anyway?)
          >

          This is interesting, we could easily cause deadlocks if not careful. I think
          this is an issue with either implementation if the "pre-commit" checking gets locks on
          rows while checking for duplicates. I think in current current patch
          the problem happens if 2 transactions at same time insert same key as an
          existing key, and in my proposal it happens sooner when 2 transactions insert
          same key at same time but does not need an existing key. There are other
          scenario's but these seem the most straight forward.
          At least in the proposed implementation
          the check phase always asks for locks left to right. I think deadlocks would
          be pretty much guaranteed for competing transactions inserting duplicate keys.
          if we go ahead and get shared locks while checking
          for duplicates. I think in the non-defered case the 2nd competing transaction
          would wait on insert, for the first transaction.

          I have not thought this all the way through but the options seem to be (in
          all cases we would have exclusive latch on the page we looking at, and thus
          row can not be changing underneath us):
          1) get shared waiting locks on each row while doing the duplicate checking.
          If we get the lock do the obvious checks (ie. if deleted no check, if
          not deleted to the dup check). If we get a timeout or deadlock I suggest
          catching and throwing a duplicate key error. It means that another
          transaction has an active write action on a duplicate row, so seems
          reasonable to return a duplicate error.

          I think it would be bad to let the timeout or deadlock escape to the user,
          as it is hard to explain how a transaction doing a single insert is
          deadlocking.

          2) get shared non-waiting locks on each row. will throw more "in-flight"
          duplicate key errors that #1, but will not pay the wait/deadlock wait
          time penalty. Do same error handling as in #1, but will only get timeout's.

          3) don't do any locking. Then would have to throw duplicate error even on
          deleted rows as there is no way to tell if they are committed deleted or
          not.

          I would not do option #3. I lean toward option #2, but would be ok with
          option #1.

          I will think about if there is anything smart we can do if we know the
          rows we are looking at are "locked for insert", but nothing comes to mind
          yet.

          Show
          Mike Matrigali added a comment - > > Also with such a solution, at commit time two concurrent transactions both > inserting duplicates would both be running checks. Could that give rise to > dead-locks I wonder? Could both have a row level X write lock on a duplicate > in addition to an existing (left-most) unlocked existing row, say? Or can we > latch the X locked row in the BTree to detect a duplicate anyway?) > This is interesting, we could easily cause deadlocks if not careful. I think this is an issue with either implementation if the "pre-commit" checking gets locks on rows while checking for duplicates. I think in current current patch the problem happens if 2 transactions at same time insert same key as an existing key, and in my proposal it happens sooner when 2 transactions insert same key at same time but does not need an existing key. There are other scenario's but these seem the most straight forward. At least in the proposed implementation the check phase always asks for locks left to right. I think deadlocks would be pretty much guaranteed for competing transactions inserting duplicate keys. if we go ahead and get shared locks while checking for duplicates. I think in the non-defered case the 2nd competing transaction would wait on insert, for the first transaction. I have not thought this all the way through but the options seem to be (in all cases we would have exclusive latch on the page we looking at, and thus row can not be changing underneath us): 1) get shared waiting locks on each row while doing the duplicate checking. If we get the lock do the obvious checks (ie. if deleted no check, if not deleted to the dup check). If we get a timeout or deadlock I suggest catching and throwing a duplicate key error. It means that another transaction has an active write action on a duplicate row, so seems reasonable to return a duplicate error. I think it would be bad to let the timeout or deadlock escape to the user, as it is hard to explain how a transaction doing a single insert is deadlocking. 2) get shared non-waiting locks on each row. will throw more "in-flight" duplicate key errors that #1, but will not pay the wait/deadlock wait time penalty. Do same error handling as in #1, but will only get timeout's. 3) don't do any locking. Then would have to throw duplicate error even on deleted rows as there is no way to tell if they are committed deleted or not. I would not do option #3. I lean toward option #2, but would be ok with option #1. I will think about if there is anything smart we can do if we know the rows we are looking at are "locked for insert", but nothing comes to mind yet.
          Hide
          Knut Anders Hatlen added a comment -

          What would the alternative solution do when the deferrable constraint operates in immediate mode? Then the index that is backing a primary key constraint should behave as a normal unique index, even though it is non-unique at the store level. So in that mode I suppose it would still need to do a left and right check on every insert somehow?

          Show
          Knut Anders Hatlen added a comment - What would the alternative solution do when the deferrable constraint operates in immediate mode? Then the index that is backing a primary key constraint should behave as a normal unique index, even though it is non-unique at the store level. So in that mode I suppose it would still need to do a left and right check on every insert somehow?
          Hide
          Dag H. Wanvik added a comment -

          Yes, the present patch solves this by throwing instead of saving duplicate key. Not sure how this would work
          with Mike's proposed approach - we do need this dynamic mode switching in order to comply with the standard.
          Maybe we could make a store API to provide language with the ability to decide whether to insert or not (depending on constraint mode), e.g. return keys to left and right of key (with locked rows) of the row to be inserted. I sort of feel it would be a shame in any case to lose the improved performance early checking would give... we have indexes in part for support constraints after all.

          Show
          Dag H. Wanvik added a comment - Yes, the present patch solves this by throwing instead of saving duplicate key. Not sure how this would work with Mike's proposed approach - we do need this dynamic mode switching in order to comply with the standard. Maybe we could make a store API to provide language with the ability to decide whether to insert or not (depending on constraint mode), e.g. return keys to left and right of key (with locked rows) of the row to be inserted. I sort of feel it would be a shame in any case to lose the improved performance early checking would give... we have indexes in part for support constraints after all.
          Hide
          Knut Anders Hatlen added a comment -

          If we want to do the duplicate checking in the language layer, would the following work?

          • Before inserting the new row into the index, do a BTree scan with serializable isolation level and startkey = stopkey = key columns of the new row.
          • Check if the scan returned empty result. If it was empty, just insert the row (the range should be protected by the serializable scan, so it should be safe). If it was not empty, throw exception (on immediate check) or store the key in a hash table (on deferred check).

          That might save us from doing the left/right check, and also from having to check all inserted rows again on commit.

          Show
          Knut Anders Hatlen added a comment - If we want to do the duplicate checking in the language layer, would the following work? Before inserting the new row into the index, do a BTree scan with serializable isolation level and startkey = stopkey = key columns of the new row. Check if the scan returned empty result. If it was empty, just insert the row (the range should be protected by the serializable scan, so it should be safe). If it was not empty, throw exception (on immediate check) or store the key in a hash table (on deferred check). That might save us from doing the left/right check, and also from having to check all inserted rows again on commit.
          Hide
          Dag H. Wanvik added a comment -

          This sounds promising. As a bonus, we might also be able to leverage this approach for to get rid of the left/right checking for "normal" (not deferrable) nullable unique constraints, so we could get rid of the old error-prone approach.

          Show
          Dag H. Wanvik added a comment - This sounds promising. As a bonus, we might also be able to leverage this approach for to get rid of the left/right checking for "normal" (not deferrable) nullable unique constraints, so we could get rid of the old error-prone approach.
          Hide
          Mike Matrigali added a comment -

          my proposal would be to hopefully use the exact same code to check for duplicates in all deferrable cases.
          The only difference would be the timing of the check.

          So in both cases just do the insert as normal into the duplicate allowing index.

          In the deferred case we track that we need to do a duplicate check and then do it later.
          In the immediate case we just do the duplicate check right away. Hopefully the checking code is the same.
          And I do think there is an obvious benefit and clean to adding a new store interface to do the check. In
          both cases other transactions don't see the duplicate key because of locking, and deleting the
          duplicate rows is taken care of by duplicate key abort.

          I think knut's suggestion might work. I would rather see us first implement stuff all the same (deferred and
          immedate) and measure. then see if it is worth adding different codepaths for optimization.

          +1 to later thinking about removing all the left/right checking and use this same approach for
          nullable unique constraints.

          It does seem like it is well known in other systems that deferrable constraints are likely to not perform
          as well as deferrable constraints. So I think it is ok for 1st implementation to not match performance, just
          be correct. I do wonder how often users actually want to defer uniqueness constraint other than possibly
          initial loading situations. I think the big request is really about deferred foriegn key contraints, things like
          loading a child before a parent, or some set of circular constraints that are hard to order.

          Show
          Mike Matrigali added a comment - my proposal would be to hopefully use the exact same code to check for duplicates in all deferrable cases. The only difference would be the timing of the check. So in both cases just do the insert as normal into the duplicate allowing index. In the deferred case we track that we need to do a duplicate check and then do it later. In the immediate case we just do the duplicate check right away. Hopefully the checking code is the same. And I do think there is an obvious benefit and clean to adding a new store interface to do the check. In both cases other transactions don't see the duplicate key because of locking, and deleting the duplicate rows is taken care of by duplicate key abort. I think knut's suggestion might work. I would rather see us first implement stuff all the same (deferred and immedate) and measure. then see if it is worth adding different codepaths for optimization. +1 to later thinking about removing all the left/right checking and use this same approach for nullable unique constraints. It does seem like it is well known in other systems that deferrable constraints are likely to not perform as well as deferrable constraints. So I think it is ok for 1st implementation to not match performance, just be correct. I do wonder how often users actually want to defer uniqueness constraint other than possibly initial loading situations. I think the big request is really about deferred foriegn key contraints, things like loading a child before a parent, or some set of circular constraints that are hard to order.
          Hide
          Mike Matrigali added a comment -

          As an initial development test I would suggest changing the system to define all non-deferable unique constraints as deferable (but not deferred), and run the complete set of tests. I do worry about unintended
          locking issues. Obviously you would not want to check this in, but would be a good sanity check. maybe some of the lock tests would show them.

          Show
          Mike Matrigali added a comment - As an initial development test I would suggest changing the system to define all non-deferable unique constraints as deferable (but not deferred), and run the complete set of tests. I do worry about unintended locking issues. Obviously you would not want to check this in, but would be a good sanity check. maybe some of the lock tests would show them.
          Hide
          Dag H. Wanvik added a comment -

          Good idea to run all tests with default deferrable constraints, but with immediate checking, I'll run this newly uploaded patch, derby-532-serializable-scan-1 though that experiment.

          This patch removed the special logic in BTreeController, and moves the extra checking to language (IndexChanger is in package *.impl.sql.execute).

          It essentially uses the approach Knut suggested force strict same order locking: a serializable scan on the key of the index row proposed to be inserted. For a deferrable primary key that inserts a non-duplicate, this gives the following locks, with a number indicate order they are set

          U[v-1] : an update lock on the previous row in the index (2.)
          X[v]: an exclusive lock on the newly inserted row (1. set before the U-lock when inserting into base table)

          For a normal primary key insert, we only get the latter lock.
          For an insert of a (first) duplicate, we would see the following locks:

          U[v-1]: an update lock on the previous row in the index (2.)
          U[v]: an update lock on the first inserted value, for which we now insert a duplicate (3.)
          X[v]: an exclusive lock on the newly inserted row. (1.)

          Since any transaction wanting to insert "v" would need to lock "v-1" to the left, we would effectively serialize any contending transactions behind the first one to insert "v"; meaning they would detect the duplicate, and throw (not deferred) or postpone further checking till commit (deferred mode).

          A test case verifying this has been added in ConstraintCharacteristicsTest#testLocks.
          The patch is a sum of the patched not yet committed so far (no prerequisites). I had to make one small concession: the unique nullable constraints needed an extra predicate inside BtreeController to allow them to insert a duplicate. An alternative would be to mark these indexes as something else than "uniqueWithDuplicateNulls" when constraints are deferred.

          The checking at commit time uses a BtreeScan also, but not an identical one, it uses read committed, read-only no hold lock scan. I think this should be safe .

          Show
          Dag H. Wanvik added a comment - Good idea to run all tests with default deferrable constraints, but with immediate checking, I'll run this newly uploaded patch, derby-532-serializable-scan-1 though that experiment. This patch removed the special logic in BTreeController, and moves the extra checking to language (IndexChanger is in package *.impl.sql.execute). It essentially uses the approach Knut suggested force strict same order locking: a serializable scan on the key of the index row proposed to be inserted. For a deferrable primary key that inserts a non-duplicate, this gives the following locks, with a number indicate order they are set U [v-1] : an update lock on the previous row in the index (2.) X [v] : an exclusive lock on the newly inserted row (1. set before the U-lock when inserting into base table) For a normal primary key insert, we only get the latter lock. For an insert of a (first) duplicate, we would see the following locks: U [v-1] : an update lock on the previous row in the index (2.) U [v] : an update lock on the first inserted value, for which we now insert a duplicate (3.) X [v] : an exclusive lock on the newly inserted row. (1.) Since any transaction wanting to insert "v" would need to lock "v-1" to the left, we would effectively serialize any contending transactions behind the first one to insert "v"; meaning they would detect the duplicate, and throw (not deferred) or postpone further checking till commit (deferred mode). A test case verifying this has been added in ConstraintCharacteristicsTest#testLocks. The patch is a sum of the patched not yet committed so far (no prerequisites). I had to make one small concession: the unique nullable constraints needed an extra predicate inside BtreeController to allow them to insert a duplicate. An alternative would be to mark these indexes as something else than "uniqueWithDuplicateNulls" when constraints are deferred. The checking at commit time uses a BtreeScan also, but not an identical one, it uses read committed, read-only no hold lock scan. I think this should be safe .
          Hide
          Dag H. Wanvik added a comment - - edited

          Some thoughts about correctness in the last patch, I'd appreciate help in thinking through this..

          Deferrable constraint correctness

          1. Using serializable BTree scan, on a physically non-unique index.
            Intending to insert key "v", we navigate to the first match, if any,
            of the key (the base row Location - the last column - is not used in
            scan key even if it is physically part of the index). This leaves
            the scan on the first "next" on the first "v" if it exists, or just
            past "v-1" (the slot containing the key just before "v", call it
            "v-1").
          2. transactions intending to insert value "v" will need to U-lock slot
            containing the value "v-1", since the serializable scan always locks
            previous row.
          3. One and only one transaction could find "v" missing (and hence OK to
            insert), namely the first to lock "v-1" as part of a serializable
            BTRee scan. [Any other transaction type with a U lock on "v-1" would
            not be looking to insert "v", if it did it would need to do the same
            thing.] Later transactions to lock "v-1" would find "v" committed,
            and so would need to throw (not deferred mode), or postpose checking
            till transaction commit (deferred mode).
          4. If the lock holder of "v-1" (the first transaction) decides to
            delete "v" or roll back, the road will be open for other
            transactions to insert "v" once the U-lock on "v-1" is released by
            trans one.
          5. In transactions, SQL mandates duplicates be visible to self later in
            the transaction, which would hold since self owns the X-lock till
            transaction commit time.
          6. Any other transaction would not see "v", until the X-lock is release
            at commit time (with the exception of read uncommitted transactions).
          7. Deleted slots are handled (presumably) correctly by the existing
            BTreeScan machinery.
          8. The above holds for both unique nullable and unique not nullable
            deferrable, since in both cases, we use the same physically
            non-unique index and the same method of checking uniqueness with a
            serializable BTreeScan: the only difference is that insertion of "v"
            would go ahead in the presence of an existing "v" iff the key
            contains one or more nulls. We forego the left-right checking
            currently being done for non-deferrable nullable unique indexes.
          9. At commit time, all transactions that inserted "v" but found it to
            be a duplicate must check that "v" is now present in max one row in
            the BTree. For this we use a read BTreeScan which releases its locks
            as it goes along (ISOLATION_READ_COMMITTED_NOHOLDLOCK). By
            navigating to the first "v" we have two cases for the transaction:
            • The "v" was present before transaction in which it inserted a
              duplicate: it already has a U-lock on the first "v" and and X-lock
              on the newly inserted ones. So, its the scan should be able to
              position a read scan on both kinds of rows, and so detect if there
              is more than one of them.
            • There was no "v" present before this transaction came along, in
              which case it has X-lock to all inserted "v"s. Also in this case,
              the transaction has the locks it needs to position a read scan and
              check.
          10. Possible dead-locks:
            Since the writing transaction takes U and X locks, no other writing
            transaction can take U or X locks on these rows (only R is compatible
            with U). So since a trans has U and X or "v-1" and "v" and doesn't
            need to upgrade that to more locks during the transaction (at least
            for the insert of value "v"), two transactions that insert different
            values should be free to act concurrently, iff the values are not
            adjacent in the Btree.
          11. As for the checking phase at commit time, the read only BTreeScan used
            takes read locks on locks we already have, so again multiple
            transaction should be free to both move forward concurrently iff
            inserting non adjacent values.
          12. Any dead-locks should thus be the result of the transactions trying to
            read each others inserted values before commit (normal), e.g.

            t1: X[v] wants R[z]
            t2: X[z] wants R[v]

            This window slightly increases as the transactions now need to U-lock
            "v-1" which isn't necessary with non deferrable PK indexes, at least
            not in read committed mode.

          Show
          Dag H. Wanvik added a comment - - edited Some thoughts about correctness in the last patch, I'd appreciate help in thinking through this.. Deferrable constraint correctness Using serializable BTree scan, on a physically non-unique index. Intending to insert key "v", we navigate to the first match, if any, of the key (the base row Location - the last column - is not used in scan key even if it is physically part of the index). This leaves the scan on the first "next" on the first "v" if it exists, or just past "v-1" (the slot containing the key just before "v", call it "v-1"). transactions intending to insert value "v" will need to U-lock slot containing the value "v-1", since the serializable scan always locks previous row. One and only one transaction could find "v" missing (and hence OK to insert), namely the first to lock "v-1" as part of a serializable BTRee scan. [Any other transaction type with a U lock on "v-1" would not be looking to insert "v", if it did it would need to do the same thing.] Later transactions to lock "v-1" would find "v" committed, and so would need to throw (not deferred mode), or postpose checking till transaction commit (deferred mode). If the lock holder of "v-1" (the first transaction) decides to delete "v" or roll back, the road will be open for other transactions to insert "v" once the U-lock on "v-1" is released by trans one. In transactions, SQL mandates duplicates be visible to self later in the transaction, which would hold since self owns the X-lock till transaction commit time. Any other transaction would not see "v", until the X-lock is release at commit time (with the exception of read uncommitted transactions). Deleted slots are handled (presumably) correctly by the existing BTreeScan machinery. The above holds for both unique nullable and unique not nullable deferrable, since in both cases, we use the same physically non-unique index and the same method of checking uniqueness with a serializable BTreeScan: the only difference is that insertion of "v" would go ahead in the presence of an existing "v" iff the key contains one or more nulls. We forego the left-right checking currently being done for non-deferrable nullable unique indexes. At commit time, all transactions that inserted "v" but found it to be a duplicate must check that "v" is now present in max one row in the BTree. For this we use a read BTreeScan which releases its locks as it goes along (ISOLATION_READ_COMMITTED_NOHOLDLOCK). By navigating to the first "v" we have two cases for the transaction: The "v" was present before transaction in which it inserted a duplicate: it already has a U-lock on the first "v" and and X-lock on the newly inserted ones. So, its the scan should be able to position a read scan on both kinds of rows, and so detect if there is more than one of them. There was no "v" present before this transaction came along, in which case it has X-lock to all inserted "v"s. Also in this case, the transaction has the locks it needs to position a read scan and check. Possible dead-locks: Since the writing transaction takes U and X locks, no other writing transaction can take U or X locks on these rows (only R is compatible with U). So since a trans has U and X or "v-1" and "v" and doesn't need to upgrade that to more locks during the transaction (at least for the insert of value "v"), two transactions that insert different values should be free to act concurrently, iff the values are not adjacent in the Btree. As for the checking phase at commit time, the read only BTreeScan used takes read locks on locks we already have, so again multiple transaction should be free to both move forward concurrently iff inserting non adjacent values. Any dead-locks should thus be the result of the transactions trying to read each others inserted values before commit (normal), e.g. t1: X [v] wants R [z] t2: X [z] wants R [v] This window slightly increases as the transactions now need to U-lock "v-1" which isn't necessary with non deferrable PK indexes, at least not in read committed mode.
          Hide
          Mike Matrigali added a comment - - edited

          i will look at this, though it does seem quite complicated. Is it any less complicated to prove correct if you do the check after the insert always?

          For review I am first just going to concentrate on the algorithm described above. If we all agree on it working, then I will look at the code. Hopefully the code will also have the above description as comments somewhere.

          Show
          Mike Matrigali added a comment - - edited i will look at this, though it does seem quite complicated. Is it any less complicated to prove correct if you do the check after the insert always? For review I am first just going to concentrate on the algorithm described above. If we all agree on it working, then I will look at the code. Hopefully the code will also have the above description as comments somewhere.
          Hide
          Mike Matrigali added a comment -

          question: Is it possible for 2 transactions to be operating at the same time on a deferrable constraint and in one case the checking is deferred and in the 2nd case the checking is immediate?

          Show
          Mike Matrigali added a comment - question: Is it possible for 2 transactions to be operating at the same time on a deferrable constraint and in one case the checking is deferred and in the 2nd case the checking is immediate?
          Hide
          Mike Matrigali added a comment -

          >I had to make one small concession: the unique nullable constraints needed an extra predicate inside >BtreeController to allow them to insert a duplicate. An alternative would be to mark these indexes as >something else than "uniqueWithDuplicateNulls" when constraints are deferred.
          In the deferable case I think these should be just standard duplicate allowing index, and do the
          "allow duplicate" checking in language. I believe it is very easy. If any key in the row is null always
          allow it, otherwise it becomes the same case as the other defered uniqueness constraints.

          Show
          Mike Matrigali added a comment - >I had to make one small concession: the unique nullable constraints needed an extra predicate inside >BtreeController to allow them to insert a duplicate. An alternative would be to mark these indexes as >something else than "uniqueWithDuplicateNulls" when constraints are deferred. In the deferable case I think these should be just standard duplicate allowing index, and do the "allow duplicate" checking in language. I believe it is very easy. If any key in the row is null always allow it, otherwise it becomes the same case as the other defered uniqueness constraints.
          Hide
          Mike Matrigali added a comment -

          could you describe what behavior you expect from the current algorithm in the following case:
          deferable constraint, in deferred mode
          xact 1:
          inserts v
          does not commit
          xact 2:
          inserts duplicate v

          reading above it seems like xact 2 is going to block doing the "ahead" check, which does not seem
          to be what a deferred constraint wants to happen.

          Show
          Mike Matrigali added a comment - could you describe what behavior you expect from the current algorithm in the following case: deferable constraint, in deferred mode xact 1: inserts v does not commit xact 2: inserts duplicate v reading above it seems like xact 2 is going to block doing the "ahead" check, which does not seem to be what a deferred constraint wants to happen.
          Hide
          Mike Matrigali added a comment -

          I thought this patch was removing the dependency on a new btree version. If a new version is needed, could you comment on why? If not, then there is some leftover stuff in the patch dealing with a new btree version
          (format id's, upgrade stuff, ...).

          Show
          Mike Matrigali added a comment - I thought this patch was removing the dependency on a new btree version. If a new version is needed, could you comment on why? If not, then there is some leftover stuff in the patch dealing with a new btree version (format id's, upgrade stuff, ...).
          Hide
          Knut Anders Hatlen added a comment -

          As to locking, I think it should be safe to release the prev key lock once the new key has been inserted (both in the immediate case and the deferred case). At that point, the X lock on the new key will be enough to prevent other transactions from inserting a duplicate key (they will be blocked when doing the initial scan to see if the key is already in the index). That might allow a slightly higher level of concurrency. I don't know if we currently have the interface to release the prev key lock early, though. And in any case that's clearly an optimization that could wait until we have a working solution.

          Show
          Knut Anders Hatlen added a comment - As to locking, I think it should be safe to release the prev key lock once the new key has been inserted (both in the immediate case and the deferred case). At that point, the X lock on the new key will be enough to prevent other transactions from inserting a duplicate key (they will be blocked when doing the initial scan to see if the key is already in the index). That might allow a slightly higher level of concurrency. I don't know if we currently have the interface to release the prev key lock early, though. And in any case that's clearly an optimization that could wait until we have a working solution.
          Hide
          Dag H. Wanvik added a comment -

          Thanks, Mike & Knut!

          > question: Is it possible for 2 transactions to be operating at the
          > same time on a deferrable constraint and in one case the checking is
          > deferred and in the 2nd case the checking is immediate?

          Yes, the constraint mode is session local according to the standard. The present patch allows this, I think.

          >>I had to make one small concession: the unique nullable constraints
          >>needed an extra predicate inside BtreeController to allow them to
          >>insert a duplicate. An alternative would be to mark these indexes as
          >>something else than "uniqueWithDuplicateNulls" when constraints are
          >>deferred.
          >
          >In the deferable case I think these should be just standard
          >duplicate allowing index, and do the "allow duplicate" checking in
          >language. I believe it is very easy. If any key in the row is null
          >always allow it, otherwise it becomes the same case as the other
          >defered uniqueness constraints.

          Yes, and this is indeed what we do for nullable unique for deferrable indexes. I think I can fix that by just omitting to make the index conglomerate with uniqueWithDuplicateNulls in the deferrable case, yes.

          As for the locking case, xact 2 would block. I agree with Knut we could optimize the locking as suggested later.

          > If a new version is needed, could you comment on why?

          Cf. the above: if we handle the uniqueWithDuplicateNulls as discussed above, the need to mark the conglomerate with hasDeferrableChecking would go away (and we no longer need the isUniqueDeferrable at the BTree level), so we don't need the format change: I'll roll a new patch.

          Show
          Dag H. Wanvik added a comment - Thanks, Mike & Knut! > question: Is it possible for 2 transactions to be operating at the > same time on a deferrable constraint and in one case the checking is > deferred and in the 2nd case the checking is immediate? Yes, the constraint mode is session local according to the standard. The present patch allows this, I think. >>I had to make one small concession: the unique nullable constraints >>needed an extra predicate inside BtreeController to allow them to >>insert a duplicate. An alternative would be to mark these indexes as >>something else than "uniqueWithDuplicateNulls" when constraints are >>deferred. > >In the deferable case I think these should be just standard >duplicate allowing index, and do the "allow duplicate" checking in >language. I believe it is very easy. If any key in the row is null >always allow it, otherwise it becomes the same case as the other >defered uniqueness constraints. Yes, and this is indeed what we do for nullable unique for deferrable indexes. I think I can fix that by just omitting to make the index conglomerate with uniqueWithDuplicateNulls in the deferrable case, yes. As for the locking case, xact 2 would block. I agree with Knut we could optimize the locking as suggested later. > If a new version is needed, could you comment on why? Cf. the above: if we handle the uniqueWithDuplicateNulls as discussed above, the need to mark the conglomerate with hasDeferrableChecking would go away (and we no longer need the isUniqueDeferrable at the BTree level), so we don't need the format change: I'll roll a new patch.
          Hide
          Mike Matrigali added a comment -

          I don't think it is just the prev key lock that is a problem, but all the locking done previous to the insert - at least in the deferable constraint/deferred case.

          In the deferable constraint/deferred case my assumption is that an insert should only block in the case where a user intiated serializable transaction is blocking inserts into a range of keys. In that case it must block and that is handled by locking done as part of the insert which is not changed by this project.

          Show
          Mike Matrigali added a comment - I don't think it is just the prev key lock that is a problem, but all the locking done previous to the insert - at least in the deferable constraint/deferred case. In the deferable constraint/deferred case my assumption is that an insert should only block in the case where a user intiated serializable transaction is blocking inserts into a range of keys. In that case it must block and that is handled by locking done as part of the insert which is not changed by this project.
          Hide
          Dag H. Wanvik added a comment -

          Attaching patch derby-532-serializable-scan-2. This removes the (new) btree conglomerate format as discussed, and also adds two more test cases:

          a) to testLocks we add a test to verify xact 2 must wait to insert (another) duplicate behind xact 1
          b) to testImport, we add a test case to verify that multiple rows containing a null will not fail even for a deferred constraint.

          It also adds the point on the approach & its correctness to the code in IndexChanger as requested by Mike. Re-running regressions.

          Show
          Dag H. Wanvik added a comment - Attaching patch derby-532-serializable-scan-2. This removes the (new) btree conglomerate format as discussed, and also adds two more test cases: a) to testLocks we add a test to verify xact 2 must wait to insert (another) duplicate behind xact 1 b) to testImport, we add a test case to verify that multiple rows containing a null will not fail even for a deferred constraint. It also adds the point on the approach & its correctness to the code in IndexChanger as requested by Mike. Re-running regressions.
          Hide
          Dag H. Wanvik added a comment -

          Mike, not sure what you are getting at here: as far as I can see the serializable scan only locks the previous key ("v-1") and possibly an existing value of the one we are trying to insert ("v"). If that's correct, and you are not worried about the lock on the previous key, what other locks are you concerned about? Are you saying the serializable scan takes lock on values below "v-1" also? I thought I looked at that and found it didn't but I could be mistaken..

          Show
          Dag H. Wanvik added a comment - Mike, not sure what you are getting at here: as far as I can see the serializable scan only locks the previous key ("v-1") and possibly an existing value of the one we are trying to insert ("v"). If that's correct, and you are not worried about the lock on the previous key, what other locks are you concerned about? Are you saying the serializable scan takes lock on values below "v-1" also? I thought I looked at that and found it didn't but I could be mistaken..
          Hide
          Mike Matrigali added a comment -

          sorry to add the confusing note about serializable scan, as I actually don't think there is a problem
          with that part of the implementation.

          I believe it is wrong in the example I asked about above, the deferrable constraint/deferred case, for an insert to block ever if there are no serialized scans in the system. The current patch insures this
          behavior and will continue to insure this behavior even if we figure out how to release the "extra"
          previous key lock.

          Any implementation that tries to do a check previous to the insert is likely to suffer from this issue, which is one of
          the reasons I still think it would be better to always to do the check after the insert, and not try to do any
          duplicate prevention locking schemes that are likely to present unintuitive concurrency issues to the
          application.

          Show
          Mike Matrigali added a comment - sorry to add the confusing note about serializable scan, as I actually don't think there is a problem with that part of the implementation. I believe it is wrong in the example I asked about above, the deferrable constraint/deferred case, for an insert to block ever if there are no serialized scans in the system. The current patch insures this behavior and will continue to insure this behavior even if we figure out how to release the "extra" previous key lock. Any implementation that tries to do a check previous to the insert is likely to suffer from this issue, which is one of the reasons I still think it would be better to always to do the check after the insert, and not try to do any duplicate prevention locking schemes that are likely to present unintuitive concurrency issues to the application.
          Hide
          Dag H. Wanvik added a comment - - edited

          Ok, thanks. I understand. I'll have a look at such an implementation, too. I shouldn't be too hard to check right after the insert with a read-only scan for "v", but I'd like to avoid saving all inserts with the pessimistic assumption they are all duplicates).

          Show
          Dag H. Wanvik added a comment - - edited Ok, thanks. I understand. I'll have a look at such an implementation, too. I shouldn't be too hard to check right after the insert with a read-only scan for "v", but I'd like to avoid saving all inserts with the pessimistic assumption they are all duplicates).
          Hide
          Mike Matrigali added a comment -

          one other consideration is that Derby actually has more than row locks than S, U, and W. We don't really document them as it is more confusing than it is worth to users, and the lock debugging tools map the 8 locks into these 3 letters. There are actually 8 different types of locks
          defined in org.apache.derby.iapi.store.raw.RowLock, and you can see the complete compatibility table
          there.
          /* Row Shared lock for repeatable read and below isolation level */
          public static final RowLock RS2 = new RowLock(0);
          /* Row Shared lock for serialized read isolation level */
          public static final RowLock RS3 = new RowLock(1);
          /* Row Update lock for reapeatable read and below isolation level*/
          public static final RowLock RU2 = new RowLock(2);
          /* Row Update lock for serializable isolation level*/
          public static final RowLock RU3 = new RowLock(3);
          /* Row Insert previous key lock */
          public static final RowLock RIP = new RowLock(4);
          /* Row Insert lock */
          public static final RowLock RI = new RowLock(5);
          /* Row exclusive write lock for repeatable read and below isolation level */
          public static final RowLock RX2 = new RowLock(6);
          /* Row exclusive write lock for serializable isolation level */
          public static final RowLock RX3 = new RowLock(7);

          The one of interest to this discussion is the RIP lock, which implements the previous key lock for inserts. This lock did not exist in the initial implementations, but was required as users found concurrency unacceptable when we just got a regular write lock as the previous key lock for inserts.

          inserts get the right previous key locks. Doing a scan before the insert is going to get the wrong previous
          key lock for an insert to have, and it is also going to block in cases I don't think are expected by someone
          using deferred constraint.

          Here is the compatibility table:

              /** Row lock compatibility table. */
              private static final boolean[][] R_COMPAT = {
                  //          Granted
                  // Request   RS2     RS3    RU2    RU3    RIP    RI     RX2    RX3
                  //
                  /* RS2 */    {true,  true,  true,  true,  true,  false, false, false },
                  /* RS3 */    {true,  true,  true,  true,  false, false, false, false },
                  /* RU2 */    {true,  true,  false, false, true,  false, false, false },
                  /* RU3 */    {true,  true,  false, false, false, false, false, false },
                  /* RIP */    {true,  false, true,  false, true,  true , true,  false },
                  /* RI  */    {false, false, false, false, true,  false, false, false },
                  /* RX2 */    {false, false, false, false, true,  false, false, false },
                  /* RX3 */    {false, false, false, false, false, false, false, false }
          
          
          Show
          Mike Matrigali added a comment - one other consideration is that Derby actually has more than row locks than S, U, and W. We don't really document them as it is more confusing than it is worth to users, and the lock debugging tools map the 8 locks into these 3 letters. There are actually 8 different types of locks defined in org.apache.derby.iapi.store.raw.RowLock, and you can see the complete compatibility table there. /* Row Shared lock for repeatable read and below isolation level */ public static final RowLock RS2 = new RowLock(0); /* Row Shared lock for serialized read isolation level */ public static final RowLock RS3 = new RowLock(1); /* Row Update lock for reapeatable read and below isolation level*/ public static final RowLock RU2 = new RowLock(2); /* Row Update lock for serializable isolation level*/ public static final RowLock RU3 = new RowLock(3); /* Row Insert previous key lock */ public static final RowLock RIP = new RowLock(4); /* Row Insert lock */ public static final RowLock RI = new RowLock(5); /* Row exclusive write lock for repeatable read and below isolation level */ public static final RowLock RX2 = new RowLock(6); /* Row exclusive write lock for serializable isolation level */ public static final RowLock RX3 = new RowLock(7); The one of interest to this discussion is the RIP lock, which implements the previous key lock for inserts. This lock did not exist in the initial implementations, but was required as users found concurrency unacceptable when we just got a regular write lock as the previous key lock for inserts. inserts get the right previous key locks. Doing a scan before the insert is going to get the wrong previous key lock for an insert to have, and it is also going to block in cases I don't think are expected by someone using deferred constraint. Here is the compatibility table: /** Row lock compatibility table. */ private static final boolean [][] R_COMPAT = { // Granted // Request RS2 RS3 RU2 RU3 RIP RI RX2 RX3 // /* RS2 */ { true , true , true , true , true , false , false , false }, /* RS3 */ { true , true , true , true , false , false , false , false }, /* RU2 */ { true , true , false , false , true , false , false , false }, /* RU3 */ { true , true , false , false , false , false , false , false }, /* RIP */ { true , false , true , false , true , true , true , false }, /* RI */ { false , false , false , false , true , false , false , false }, /* RX2 */ { false , false , false , false , true , false , false , false }, /* RX3 */ { false , false , false , false , false , false , false , false }
          Hide
          Dag H. Wanvik added a comment -

          Thanks for that information, Mike. Interesting; I wasn't aware of this. I see the RIP is compatible with the other X-locks except the seriablizable ones (symmetrically). So, for the Btree serializable scan couldn't we make the the scan optionally use this kind of lock?

          Show
          Dag H. Wanvik added a comment - Thanks for that information, Mike. Interesting; I wasn't aware of this. I see the RIP is compatible with the other X-locks except the seriablizable ones (symmetrically). So, for the Btree serializable scan couldn't we make the the scan optionally use this kind of lock?
          Hide
          Mike Matrigali added a comment -

          I do think we need to think hard about a similar set of issues that you wrote up for the single post check after the insert, to
          convince everyone of correctness. But I think the post scan need only answer a single question, did the
          insert I made cause a duplicate key - I don't think it needs to prevent a subsequent duplicate key from
          being inserted, even while the scan is happening so it can do a simpler scan

          assuming:
          1) all inserts into this tree will do the same left to right scan for duplicates after the insert. It may happen
          right after the insert or it may happen at end of transaction.
          2) the scan positions at the first instance of the key (with no row position).
          If the scan comes back with no key assert if it is an immedate check as something is seriously wrong,
          otherwise in deferred mode I think it is ok as the inserting transaction may have deleted it in the meantime
          and we should not try to track that.
          if the key exists it waits for a S lock on the key. If it gets a lock and it is deleted then all is fine. If it
          gets a lock then note it if it is the first or throw a duplicate key error if it is the 2nd. The normal case
          is the scan finds one and only one row.
          If it gets a lock timeout or a deadlock then I am not positive what to do. I lean toward catching and
          turning it
          into a duplicate key error. I think applications are much more likely to handle that than a unexpected
          deadlock. I think the easy case of 2 concurrent inserts of same key to a deferred constraint is going
          to generate a deadlock, and users are not going to expect a system with 2 xacts doing just a single
          insert each to generate a deadlock.

          Show
          Mike Matrigali added a comment - I do think we need to think hard about a similar set of issues that you wrote up for the single post check after the insert, to convince everyone of correctness. But I think the post scan need only answer a single question, did the insert I made cause a duplicate key - I don't think it needs to prevent a subsequent duplicate key from being inserted, even while the scan is happening so it can do a simpler scan assuming: 1) all inserts into this tree will do the same left to right scan for duplicates after the insert. It may happen right after the insert or it may happen at end of transaction. 2) the scan positions at the first instance of the key (with no row position). If the scan comes back with no key assert if it is an immedate check as something is seriously wrong, otherwise in deferred mode I think it is ok as the inserting transaction may have deleted it in the meantime and we should not try to track that. if the key exists it waits for a S lock on the key. If it gets a lock and it is deleted then all is fine. If it gets a lock then note it if it is the first or throw a duplicate key error if it is the 2nd. The normal case is the scan finds one and only one row. If it gets a lock timeout or a deadlock then I am not positive what to do. I lean toward catching and turning it into a duplicate key error. I think applications are much more likely to handle that than a unexpected deadlock. I think the easy case of 2 concurrent inserts of same key to a deferred constraint is going to generate a deadlock, and users are not going to expect a system with 2 xacts doing just a single insert each to generate a deadlock.
          Hide
          Mike Matrigali added a comment -

          As part of trying to understand deferred constraints I did a bit of searching on the Web. And one thing that kept popping up is a lot of experts recommending that database designers not pick deferred constraints, if at all possible. One of the main reasons for this is that there was consensus that many if not most applications do not code for errors coming out of a "commit". I believe the standard has always allowed for commit to fail,
          but previous to this I think in derby the only thing that would make commit fail would be some sort of server
          crashing even like a low level bug, or the transaction logging system running out resources to write the
          commit log record.

          So I suggest that we highlight in the documentation for this project and for the individual new ddl that using this feature will call for applications to have error handling for commit to fail. This may be new for many applications. Probably would be good to show some sample code on how to handle a deferred unique constraint failing at commit time.

          Show
          Mike Matrigali added a comment - As part of trying to understand deferred constraints I did a bit of searching on the Web. And one thing that kept popping up is a lot of experts recommending that database designers not pick deferred constraints, if at all possible. One of the main reasons for this is that there was consensus that many if not most applications do not code for errors coming out of a "commit". I believe the standard has always allowed for commit to fail, but previous to this I think in derby the only thing that would make commit fail would be some sort of server crashing even like a low level bug, or the transaction logging system running out resources to write the commit log record. So I suggest that we highlight in the documentation for this project and for the individual new ddl that using this feature will call for applications to have error handling for commit to fail. This may be new for many applications. Probably would be good to show some sample code on how to handle a deferred unique constraint failing at commit time.
          Hide
          Mike Matrigali added a comment -

          >Thanks for that information, Mike. Interesting; I wasn't aware of this. I see the RIP is compatible with the other >X-locks except the seriablizable ones (symmetrically). So, for the Btree serializable scan couldn't we make >the the scan optionally use this kind of lock?
          The RIP lock is used to stop the insert if a serialized scan is going on, it is not used by the existing serialized scan. So I don't think it fixes the problems I have raised even if we were to provide some new locking scan.

          I think the key requirement we should agree on that I am currently stuck on, is what should an insert into a deferrable constraint/deferred do when there exists an uncommitted insert of the same key in the table. I
          think the insert should procede and the current patch blocks. And I think the current patch blocks even if
          we were to not get a previous key lock because we scan for duplicates ahead of the insert in the deferred
          case.

          Show
          Mike Matrigali added a comment - >Thanks for that information, Mike. Interesting; I wasn't aware of this. I see the RIP is compatible with the other >X-locks except the seriablizable ones (symmetrically). So, for the Btree serializable scan couldn't we make >the the scan optionally use this kind of lock? The RIP lock is used to stop the insert if a serialized scan is going on, it is not used by the existing serialized scan. So I don't think it fixes the problems I have raised even if we were to provide some new locking scan. I think the key requirement we should agree on that I am currently stuck on, is what should an insert into a deferrable constraint/deferred do when there exists an uncommitted insert of the same key in the table. I think the insert should procede and the current patch blocks. And I think the current patch blocks even if we were to not get a previous key lock because we scan for duplicates ahead of the insert in the deferred case.
          Hide
          Mike Matrigali added a comment -

          I don't think it is a big deal to add all inserts to the deferred check list when you are in deferred mode, but recognize others may not agree.
          I consider any code that eliminates the inserts from the list early just an optimization, but any problem with
          the optimization can easily lead to a corrupt table if it misses just one case it should have checked later. I think
          we agree at this point that all the other pieces still need to happen.

          If you think the optimization is necessary do you think the following can be proven correct, ie we will never
          miss a case where the row we inserted is causing a duplicate key and should have been added to the
          deferred check list:
          o do the insert first.
          o in deferred case do an immediate non-serialized, no hold, no lock wait, read only scan of the keys.
          if duplicate add to list, and if it could not get a lock add to the list. locks have to be acquired for store
          to do the right thing with rows marked deleted - can not do read uncommitted.
          o in immediate case do the lock wait, read scan.

          and in deferred case do the lock wait read scan at commit.

          In both scan cases it is possible that while we are scanning someone might add a duplicate row
          that the can will miss, but I think that is ok. We just need to make sure all the rows as of "now" are being
          checked. And we insure that because the scan positions at the leftmost matching key "now" and moves
          right. No "now" row can move left. It is up to whoever is inserting those later rows to do their own check.
          But I appreciate all to think about this.

          Show
          Mike Matrigali added a comment - I don't think it is a big deal to add all inserts to the deferred check list when you are in deferred mode, but recognize others may not agree. I consider any code that eliminates the inserts from the list early just an optimization, but any problem with the optimization can easily lead to a corrupt table if it misses just one case it should have checked later. I think we agree at this point that all the other pieces still need to happen. If you think the optimization is necessary do you think the following can be proven correct, ie we will never miss a case where the row we inserted is causing a duplicate key and should have been added to the deferred check list: o do the insert first. o in deferred case do an immediate non-serialized, no hold, no lock wait, read only scan of the keys. if duplicate add to list, and if it could not get a lock add to the list. locks have to be acquired for store to do the right thing with rows marked deleted - can not do read uncommitted. o in immediate case do the lock wait, read scan. and in deferred case do the lock wait read scan at commit. In both scan cases it is possible that while we are scanning someone might add a duplicate row that the can will miss, but I think that is ok. We just need to make sure all the rows as of "now" are being checked. And we insure that because the scan positions at the leftmost matching key "now" and moves right. No "now" row can move left. It is up to whoever is inserting those later rows to do their own check. But I appreciate all to think about this.
          Hide
          Mike Matrigali added a comment - - edited

          it would be good either as part of this project or a follow on to enhance the consistency checker to make sure keys in these indexes that support deferable constraints are all unique within whatever the appropriate guidelines
          (ie. allow duplicate nulls or not). If a table level lock is held then it should be true that there are no duplicates, even though the underlying implementation btree is a duplicate allowing index.

          being able to run this after some set of stressing the deferable constraint code in tests might make it easier to test.

          Show
          Mike Matrigali added a comment - - edited it would be good either as part of this project or a follow on to enhance the consistency checker to make sure keys in these indexes that support deferable constraints are all unique within whatever the appropriate guidelines (ie. allow duplicate nulls or not). If a table level lock is held then it should be true that there are no duplicates, even though the underlying implementation btree is a duplicate allowing index. being able to run this after some set of stressing the deferable constraint code in tests might make it easier to test.
          Hide
          Dag H. Wanvik added a comment - - edited

          > If you think the optimization is necessary do you think the following
          > can be proven correct, ie we will never miss a case where the row we
          > inserted is causing a duplicate key and should have been added to the
          > deferred check list:
          >
          > o do the insert first.
          >
          > o in deferred case do an immediate non-serialized, no hold, no lock
          > wait, read only scan of the keys. if duplicate add to list, and if
          > it could not get a lock add to the list. locks have to be acquired
          > for store to do the right thing with rows marked deleted - can not
          > do read uncommitted.

          Yes, I agree we wouldn't want to wait for the lock there; just adding it to the list in a "pessimistic" assumption is fine, so we'd check again on commit.

          >
          > o in immediate case do the lock wait, read scan.

          Yes, and report as duplicate if timeout or deadlock. I think it's better than reporting lock timeout as you suggest.

          >
          > and in deferred case do the lock wait read scan at commit.

          Yes, and again, report as duplicate if timeout or deadlock.

          > In both scan cases it is possible that while we are scanning someone
          > might add a duplicate row that the scan will miss, but I think that is
          > ok. We just need to make sure all the rows as of "now" are being
          > checked. And we insure that because the scan positions at the leftmost
          > matching key "now" and moves right. No "now" row can move left. It is
          > up to whoever is inserting those later rows to do their own check.
          > But I appreciate all to think about this.

          I believe this will be correct, but I will do some experiments.

          Show
          Dag H. Wanvik added a comment - - edited > If you think the optimization is necessary do you think the following > can be proven correct, ie we will never miss a case where the row we > inserted is causing a duplicate key and should have been added to the > deferred check list: > > o do the insert first. > > o in deferred case do an immediate non-serialized, no hold, no lock > wait, read only scan of the keys. if duplicate add to list, and if > it could not get a lock add to the list. locks have to be acquired > for store to do the right thing with rows marked deleted - can not > do read uncommitted. Yes, I agree we wouldn't want to wait for the lock there; just adding it to the list in a "pessimistic" assumption is fine, so we'd check again on commit. > > o in immediate case do the lock wait, read scan. Yes, and report as duplicate if timeout or deadlock. I think it's better than reporting lock timeout as you suggest. > > and in deferred case do the lock wait read scan at commit. Yes, and again, report as duplicate if timeout or deadlock. > In both scan cases it is possible that while we are scanning someone > might add a duplicate row that the scan will miss, but I think that is > ok. We just need to make sure all the rows as of "now" are being > checked. And we insure that because the scan positions at the leftmost > matching key "now" and moves right. No "now" row can move left. It is > up to whoever is inserting those later rows to do their own check. > But I appreciate all to think about this. I believe this will be correct, but I will do some experiments.
          Hide
          Dag H. Wanvik added a comment -

          > So I suggest that we highlight in the documentation for this project and for the individual new ddl
          > that using this feature will call for applications to have error handling for commit to fail.

          Yes, that will be good.

          Show
          Dag H. Wanvik added a comment - > So I suggest that we highlight in the documentation for this project and for the individual new ddl > that using this feature will call for applications to have error handling for commit to fail. Yes, that will be good.
          Hide
          Dag H. Wanvik added a comment -

          > it would be good either as part of this project or a follow on to enhance the consistency checker to make sure > keys in these indexes that support deferable constraints are all unique

          +1. If I don't get to it, I'll make a separate JIRA.

          Show
          Dag H. Wanvik added a comment - > it would be good either as part of this project or a follow on to enhance the consistency checker to make sure > keys in these indexes that support deferable constraints are all unique +1. If I don't get to it, I'll make a separate JIRA.
          Hide
          Dag H. Wanvik added a comment -

          Btw, will the BTreeScan honor the flag OPENMODE_LOCK_NOWAIT? Is so, iff on next() we can't get a lock, will it throw or return null?

          Show
          Dag H. Wanvik added a comment - Btw, will the BTreeScan honor the flag OPENMODE_LOCK_NOWAIT? Is so, iff on next() we can't get a lock, will it throw or return null?
          Hide
          Mike Matrigali added a comment -

          looking at latest patch, with changes to move the checking out of store and not special casing the types of btree
          I am wondering if the changes to the following files are still needed. I have not reviewed the changes in
          latest patch yet, just was not expecting changes to these files.:
          M java\engine\org\apache\derby\iapi\store\access\SortObserver.java
          M java\engine\org\apache\derby\impl\store\access\sort\UniqueWithDuplicateNullsMergeSort.java
          M java\engine\org\apache\derby\impl\store\access\sort\MergeSort.java
          M java\testing\org\apache\derbyTesting\unitTests\store\T_SortController.java

          If they are still needed can you give a short explanation of why, to help with my review.

          Show
          Mike Matrigali added a comment - looking at latest patch, with changes to move the checking out of store and not special casing the types of btree I am wondering if the changes to the following files are still needed. I have not reviewed the changes in latest patch yet, just was not expecting changes to these files.: M java\engine\org\apache\derby\iapi\store\access\SortObserver.java M java\engine\org\apache\derby\impl\store\access\sort\UniqueWithDuplicateNullsMergeSort.java M java\engine\org\apache\derby\impl\store\access\sort\MergeSort.java M java\testing\org\apache\derbyTesting\unitTests\store\T_SortController.java If they are still needed can you give a short explanation of why, to help with my review.
          Hide
          Mike Matrigali added a comment -

          I just looked at the protocol documentation and I don't think currently the btree scan will do what you want, with respect to OPENMODE_LOCK_NOWAIT. It looks like it will do nowait on the table lock but not on the subsequent row locks.

          I would suggest as an incremental approach to code the deferred check using waiting locks for now, and
          we can optimize it to not wait in a subsequent patch. I think even in the nowait case the new interface should throw a locktimeout error if it can't get the lock.

          I need to think about what would be the best solution to providing this functionality. We can either enhance all scans to support lock nowait, which i think is it a lot of work. Or I think it would be easier and better for this feature in the long run to provide a new interface that your code would just pass in the key you are looking for,
          some params similar to openScan to tell what type of scan to use in addition to a wait or no wait flag. This would mean the duplicate check could be done in a single trip to store, rather than open a scan and then do a next. At the lowest level of the btree we often ask for nowait locks, so that part is not hard. But making it
          available for all scans is overriding the user setting for this. This is obviously a special case, so think it is
          cleaner making it clear it is a special case by using a different interface.

          I would be interested in implementing this interface as a subsequent patch, once you get checked in. Or if you want to implement I can help. Let me know which you prefer.

          Show
          Mike Matrigali added a comment - I just looked at the protocol documentation and I don't think currently the btree scan will do what you want, with respect to OPENMODE_LOCK_NOWAIT. It looks like it will do nowait on the table lock but not on the subsequent row locks. I would suggest as an incremental approach to code the deferred check using waiting locks for now, and we can optimize it to not wait in a subsequent patch. I think even in the nowait case the new interface should throw a locktimeout error if it can't get the lock. I need to think about what would be the best solution to providing this functionality. We can either enhance all scans to support lock nowait, which i think is it a lot of work. Or I think it would be easier and better for this feature in the long run to provide a new interface that your code would just pass in the key you are looking for, some params similar to openScan to tell what type of scan to use in addition to a wait or no wait flag. This would mean the duplicate check could be done in a single trip to store, rather than open a scan and then do a next. At the lowest level of the btree we often ask for nowait locks, so that part is not hard. But making it available for all scans is overriding the user setting for this. This is obviously a special case, so think it is cleaner making it clear it is a special case by using a different interface. I would be interested in implementing this interface as a subsequent patch, once you get checked in. Or if you want to implement I can help. Let me know which you prefer.
          Hide
          Dag H. Wanvik added a comment - - edited

          (re SortObserver & friends): Possibly not, I'll have a look. I guess for deferrable, we could check for duplicates in a separate scan, but again I modeled the checking on what was done for nullable unique.

          Show
          Dag H. Wanvik added a comment - - edited (re SortObserver & friends): Possibly not, I'll have a look. I guess for deferrable, we could check for duplicates in a separate scan, but again I modeled the checking on what was done for nullable unique.
          Hide
          Dag H. Wanvik added a comment -

          > I just looked at the protocol documentation and I don't think currently the btree scan will do what you want,
          > with respect to OPENMODE_LOCK_NOWAIT. It looks like it will do nowait on the table lock but not on the
          > subsequent row locks.

          Thanks, Mike. That was was I suspected by my cursory investigation, yes. I'll code it with the waiting locks for now. I did see the lowest levels use nolock, though. If you'd like to help me make a special API for this, that would be great. I can certainly help with it too, should you be to busy to get it done. If you like, I can make a separate JIRA for in, so whoever gets to it first can grab that.

          Show
          Dag H. Wanvik added a comment - > I just looked at the protocol documentation and I don't think currently the btree scan will do what you want, > with respect to OPENMODE_LOCK_NOWAIT. It looks like it will do nowait on the table lock but not on the > subsequent row locks. Thanks, Mike. That was was I suspected by my cursory investigation, yes. I'll code it with the waiting locks for now. I did see the lowest levels use nolock, though. If you'd like to help me make a special API for this, that would be great. I can certainly help with it too, should you be to busy to get it done. If you like, I can make a separate JIRA for in, so whoever gets to it first can grab that.
          Hide
          Knut Anders Hatlen added a comment -

          >
          > o in immediate case do the lock wait, read scan.

          Yes, and report as duplicate if timeout or deadlock. I think it's better than reporting lock timeout as you suggest.

          >
          > and in deferred case do the lock wait read scan at commit.

          Yes, and again, report as duplicate if timeout or deadlock.

          If I understand this proposal correctly, an insert operation will throw duplicate key exception instead of lock timeout exception if it cannot obtain a lock during duplicate checking. If so, that doesn't match what we do in the non-deferrable case:

          ij> connect 'jdbc:derby:memory:db;create=true' as c1;
          ij> create table t(x int primary key);
          0 rows inserted/updated/deleted
          ij> autocommit off;
          ij> insert into t values 1;
          1 row inserted/updated/deleted
          ij> connect 'jdbc:derby:memory:db' as c2;
          ij(C2)> insert into t values 0;
          1 row inserted/updated/deleted
          ij(C2)> insert into t values 1;
          ERROR 40XL1: A lock could not be obtained within the time requested
          

          I agree that we should not throw lock timeout exception if we cannot get a lock immediately during the preliminary check in the deferred case. Then we should just add that key to the list of keys to check at commit time. But if we fail to get a lock in the final duplicate check (either immediate or deferred), I think it is correct to report that as a lock timeout, not as a constraint violation, as we don't know if it actually is a constraint violation until the other transactions have completed.

          Show
          Knut Anders Hatlen added a comment - > > o in immediate case do the lock wait, read scan. Yes, and report as duplicate if timeout or deadlock. I think it's better than reporting lock timeout as you suggest. > > and in deferred case do the lock wait read scan at commit. Yes, and again, report as duplicate if timeout or deadlock. If I understand this proposal correctly, an insert operation will throw duplicate key exception instead of lock timeout exception if it cannot obtain a lock during duplicate checking. If so, that doesn't match what we do in the non-deferrable case: ij> connect 'jdbc:derby:memory:db;create=true' as c1; ij> create table t(x int primary key); 0 rows inserted/updated/deleted ij> autocommit off; ij> insert into t values 1; 1 row inserted/updated/deleted ij> connect 'jdbc:derby:memory:db' as c2; ij(C2)> insert into t values 0; 1 row inserted/updated/deleted ij(C2)> insert into t values 1; ERROR 40XL1: A lock could not be obtained within the time requested I agree that we should not throw lock timeout exception if we cannot get a lock immediately during the preliminary check in the deferred case. Then we should just add that key to the list of keys to check at commit time. But if we fail to get a lock in the final duplicate check (either immediate or deferred), I think it is correct to report that as a lock timeout, not as a constraint violation, as we don't know if it actually is a constraint violation until the other transactions have completed.
          Hide
          Dag H. Wanvik added a comment -

          Created DERBY-6419 as a sub-task: Make BTree scan honor OPENMODE_LOCK_NOWAIT for row locks.

          Show
          Dag H. Wanvik added a comment - Created DERBY-6419 as a sub-task: Make BTree scan honor OPENMODE_LOCK_NOWAIT for row locks.
          Hide
          Dag H. Wanvik added a comment -

          Thanks, Knut. Seeing that we get a lock time-out also in the non.deferrable case, I tend to agree. So, in summary, at insert time, presume duplicate and check later, at commit/check time, report time-out if we can't get the lock.

          Show
          Dag H. Wanvik added a comment - Thanks, Knut. Seeing that we get a lock time-out also in the non.deferrable case, I tend to agree. So, in summary, at insert time, presume duplicate and check later, at commit/check time, report time-out if we can't get the lock.
          Hide
          Dag H. Wanvik added a comment - - edited

          (re SortObserver & friends): I think the changes are pretty small, and the alternative would be to make separate sorting passes (one for each deferrable constraint/index on a table) with new code to check the deferrables; not convinced that's worth the extra amount of code and redundancy. For the moment, I'd like to keep these changes.

          Show
          Dag H. Wanvik added a comment - - edited (re SortObserver & friends): I think the changes are pretty small, and the alternative would be to make separate sorting passes (one for each deferrable constraint/index on a table) with new code to check the deferrables; not convinced that's worth the extra amount of code and redundancy. For the moment, I'd like to keep these changes.
          Hide
          Dag H. Wanvik added a comment - - edited

          Attaching derby-532-post-scan-1 which changes the serializable scan to check before the insert to use an unconditional insert (in IndexChanger#insertAndCheckDups) plus a post-insert scan to check for duplicates. This scan presently uses a scan that will wait if a row is locked.
          A time.out on insert will presume it's a duplicate and schedule a commit-time check.
          This will changed when we implement the optimization described in DERBY-6419. A time-out (or dead-lock) at commit-time will be reported as such. A change of constraint mode to immediate which can't get a lock will also report a time-out or dead-lock.

          Re-running regressions.

          Show
          Dag H. Wanvik added a comment - - edited Attaching derby-532-post-scan-1 which changes the serializable scan to check before the insert to use an unconditional insert (in IndexChanger#insertAndCheckDups) plus a post-insert scan to check for duplicates. This scan presently uses a scan that will wait if a row is locked. A time.out on insert will presume it's a duplicate and schedule a commit-time check. This will changed when we implement the optimization described in DERBY-6419 . A time-out (or dead-lock) at commit-time will be reported as such. A change of constraint mode to immediate which can't get a lock will also report a time-out or dead-lock. Re-running regressions.
          Hide
          Dag H. Wanvik added a comment -

          Regressions passed. I intend to commit this in a day or two modulo review comments.

          Show
          Dag H. Wanvik added a comment - Regressions passed. I intend to commit this in a day or two modulo review comments.
          Hide
          Mike Matrigali added a comment -

          I plan on doing one more review of whole package. I am ok with the lock timeout deciscion, but I am concerned that I think 2 duplicate inserters to a deferrable constraint/deferred are I think guaranteed to deadlock. But I don't have a good answer to that.

          I will post comments to the new issue about the new interface we talked about. lets get this part in first.

          Show
          Mike Matrigali added a comment - I plan on doing one more review of whole package. I am ok with the lock timeout deciscion, but I am concerned that I think 2 duplicate inserters to a deferrable constraint/deferred are I think guaranteed to deadlock. But I don't have a good answer to that. I will post comments to the new issue about the new interface we talked about. lets get this part in first.
          Hide
          Mike Matrigali added a comment -

          first half of review, really only questions and requests for more comments, code so far looks fine. It would be
          good to get someone with more language expertise to look at that part. I can probably do a good job reviewing
          the changes that are there - but someone else might need to review to think about the changes that aren't there
          that might be needed. I am ok with leaving the sort observer stuff.

          Index: java/engine/org/apache/derby/catalog/IndexDescriptor.java
          nit: maybe would be good to add comments here describing that unique
          deferrable indexes are implemented using store non-uniqe indexes, and
          that sql layer owns enforcing uniqueness.

          Index: java/engine/org/apache/derby/catalog/types/IndexDescriptorImpl.java
          nit: same comment as about about more detail about deferrable indexes.

          some document table describing the use of the following associating it with
          deferred and non-deffered and different contraint types might help to make
          it clearer:

          boolean isUnique,
          boolean isUniqueWithDuplicateNulls,
          boolean isUniqueDeferrable,
          boolean hasDeferrableChecking,

          Index: java/engine/org/apache/derby/iapi/sql/conn/LanguageConnectionContext.java

          no comment.

          Index: java/engine/org/apache/derby/iapi/sql/conn/SQLSessionContext.java
          no comment.

          Index: java/engine/org/apache/derby/iapi/sql/conn/StatementContext.java
          no comment.

          java/engine/org/apache/derby/iapi/sql/dictionary/IndexRowGenerator.java
          nit: same comment about expanding on what isUniqueDeferrable means for the index

          java/engine/org/apache/derby/iapi/store/access/SortObserver.java
          would be nice for the 3 new interfaces to have javadoc comments for each,
          explaing their use in more detail.

          Index: java/engine/org/apache/derby/impl/sql/catalog/DataDictionaryImpl.java
          +1
          nice you got rid of old version check here.

          Index: java/engine/org/apache/derby/impl/sql/execute/AlterConstraintConstantActi
          on.java
          o would be good to have tests that alter back and forth from deferred and
          not deffered and make sure subsequent actions do the right thing. Could
          looks like existing mechanism should work, but seems a good code path to
          test.

          Index: java/engine/org/apache/derby/impl/sql/execute/CreateIndexConstantAction.j
          ava
          some comment on why sharing is turned off for deferable constraints would
          be good. A user is going to complain at some point when they see duplicate
          indexes taking up twice space that they don't think they need.

          Index: java/engine/org/apache/derby/impl/sql/execute/DeferredDuplicates.java

          o This code throws SQLState.LANG_DEFERRED_DUPLICATE_KEY_CONSTRAINT_T or
          SQLState.LANG_DEFERRED_DUPLICATE_KEY_CONSTRAINT_S. If these are not
          caught and remapped can someone comment on if this is the standard. Is it
          ok that this type of constrain will throw a different error than previous
          constraints? I think it is ok with standard as they all start with
          23xxx.

          can you comment on statement vs transaction level rollback. While thinking
          about this I have always assumed transaction level rollback to get the
          rows out of the table. I think statement level must be when savepoints
          are involved.

          Show
          Mike Matrigali added a comment - first half of review, really only questions and requests for more comments, code so far looks fine. It would be good to get someone with more language expertise to look at that part. I can probably do a good job reviewing the changes that are there - but someone else might need to review to think about the changes that aren't there that might be needed. I am ok with leaving the sort observer stuff. Index: java/engine/org/apache/derby/catalog/IndexDescriptor.java nit: maybe would be good to add comments here describing that unique deferrable indexes are implemented using store non-uniqe indexes, and that sql layer owns enforcing uniqueness. Index: java/engine/org/apache/derby/catalog/types/IndexDescriptorImpl.java nit: same comment as about about more detail about deferrable indexes. some document table describing the use of the following associating it with deferred and non-deffered and different contraint types might help to make it clearer: boolean isUnique, boolean isUniqueWithDuplicateNulls, boolean isUniqueDeferrable, boolean hasDeferrableChecking, Index: java/engine/org/apache/derby/iapi/sql/conn/LanguageConnectionContext.java no comment. Index: java/engine/org/apache/derby/iapi/sql/conn/SQLSessionContext.java no comment. Index: java/engine/org/apache/derby/iapi/sql/conn/StatementContext.java no comment. java/engine/org/apache/derby/iapi/sql/dictionary/IndexRowGenerator.java nit: same comment about expanding on what isUniqueDeferrable means for the index java/engine/org/apache/derby/iapi/store/access/SortObserver.java would be nice for the 3 new interfaces to have javadoc comments for each, explaing their use in more detail. Index: java/engine/org/apache/derby/impl/sql/catalog/DataDictionaryImpl.java +1 nice you got rid of old version check here. Index: java/engine/org/apache/derby/impl/sql/execute/AlterConstraintConstantActi on.java o would be good to have tests that alter back and forth from deferred and not deffered and make sure subsequent actions do the right thing. Could looks like existing mechanism should work, but seems a good code path to test. Index: java/engine/org/apache/derby/impl/sql/execute/CreateIndexConstantAction.j ava some comment on why sharing is turned off for deferable constraints would be good. A user is going to complain at some point when they see duplicate indexes taking up twice space that they don't think they need. Index: java/engine/org/apache/derby/impl/sql/execute/DeferredDuplicates.java o This code throws SQLState.LANG_DEFERRED_DUPLICATE_KEY_CONSTRAINT_T or SQLState.LANG_DEFERRED_DUPLICATE_KEY_CONSTRAINT_S. If these are not caught and remapped can someone comment on if this is the standard. Is it ok that this type of constrain will throw a different error than previous constraints? I think it is ok with standard as they all start with 23xxx. can you comment on statement vs transaction level rollback. While thinking about this I have always assumed transaction level rollback to get the rows out of the table. I think statement level must be when savepoints are involved.
          Hide
          Knut Anders Hatlen added a comment -

          I am ok with the lock timeout deciscion, but I am concerned that I think 2 duplicate inserters to a deferrable constraint/deferred are I think guaranteed to deadlock. But I don't have a good answer to that.

          If the main motivation for not doing a serializable scan up front was to avoid the potential of lock timeouts, and the result is that it guarantees deadlocks, maybe we should reconsider the serializable scan? The range lock obtained by the serializable scan will make competing transactions block earlier, and at least avoid deadlock in the simple case of two transactions doing a single-row insert of the same key at the same time.

          Show
          Knut Anders Hatlen added a comment - I am ok with the lock timeout deciscion, but I am concerned that I think 2 duplicate inserters to a deferrable constraint/deferred are I think guaranteed to deadlock. But I don't have a good answer to that. If the main motivation for not doing a serializable scan up front was to avoid the potential of lock timeouts, and the result is that it guarantees deadlocks, maybe we should reconsider the serializable scan? The range lock obtained by the serializable scan will make competing transactions block earlier, and at least avoid deadlock in the simple case of two transactions doing a single-row insert of the same key at the same time.
          Hide
          Dag H. Wanvik added a comment -

          Thanks for the comments, Mike. Please see my replies inlined..

          > Index: java/engine/org/apache/derby/catalog/IndexDescriptor.java
          > nit: maybe would be good to add comments here describing that unique
          > deferrable indexes are implemented using store non-uniqe indexes, and
          > that sql layer owns enforcing uniqueness.

          Done, cf. attachment IndexDescriptor.html and IndexDescriptorImpl.html

          >
          > Index: java/engine/org/apache/derby/catalog/types/IndexDescriptorImpl.java
          > nit: same comment as about about more detail about deferrable indexes.
          >
          > some document table describing the use of the following associating it with
          > deferred and non-deffered and different contraint types might help to make
          > it clearer:
          >
          > boolean isUnique,
          > boolean isUniqueWithDuplicateNulls,
          > boolean isUniqueDeferrable,
          > boolean hasDeferrableChecking,

          See above.

          > java/engine/org/apache/derby/iapi/sql/dictionary/IndexRowGenerator.java
          > nit: same comment about expanding on what isUniqueDeferrable means
          for the index

          Fixed, see attached IndexRowGenerator.html.

          >
          > java/engine/org/apache/derby/iapi/store/access/SortObserver.java
          > would be nice for the 3 new interfaces to have javadoc comments for each,
          > explaing their use in more detail.

          Added, see SortObserver.html

          > Index: java/engine/org/apache/derby/impl/sql/execute/AlterConstraintConstantActi
          > on.java
          >
          > o would be good to have tests that alter back and forth from deferred and
          > not deffered and make sure subsequent actions do the right thing. Could
          > looks like existing mechanism should work, but seems a good code path to
          > test.

          I have one test already and added more in ConstraintCharactericsTest#testBasicDeferral on these lines and below

          // Try to set immediate mode, and detect violation
          assertStatementError("23507", s, sCF + " immediate");
          // Once more, error above should not roll back
          assertStatementError("23507", s, sCF + " immediate");

          >
          > Index: java/engine/org/apache/derby/impl/sql/execute/CreateIndexConstantAction.java
          > o
          > some comment on why sharing is turned off for deferable constraints would
          > be good. A user is going to complain at some point when they see duplicate
          > indexes taking up twice space that they don't think they need.

          Added a comment. I guess we could introduce sharing in a follow-up patch with any non-unique indexes now that all checking happens in language.

          >
          > Index: java/engine/org/apache/derby/impl/sql/execute/DeferredDuplicates.java
          >
          > o This code throws SQLState.LANG_DEFERRED_DUPLICATE_KEY_CONSTRAINT_T or
          > SQLState.LANG_DEFERRED_DUPLICATE_KEY_CONSTRAINT_S. If these are not
          > caught and remapped can someone comment on if this is the standard. Is it
          > ok that this type of constrain will throw a different error than previous
          > constraints? I think it is ok with standard as they all start with
          > 23xxx.

          These are indeed thrown. As far as I can understand, the prefix 23 is correct.

          > can you comment on statement vs transaction level rollback. While thinking
          > about this I have always assumed transaction level rollback to get the
          > rows out of the table. I think statement level must be when savepoints
          > are involved.

          The _S variant doesn't roll back - the transaction is still active and if the offending rows aren't deleted, the commit will throw the _T variant and roll back. This is tested in ConstraintCharactericsTest#testBasicDeferral.

          Show
          Dag H. Wanvik added a comment - Thanks for the comments, Mike. Please see my replies inlined.. > Index: java/engine/org/apache/derby/catalog/IndexDescriptor.java > nit: maybe would be good to add comments here describing that unique > deferrable indexes are implemented using store non-uniqe indexes, and > that sql layer owns enforcing uniqueness. Done, cf. attachment IndexDescriptor.html and IndexDescriptorImpl.html > > Index: java/engine/org/apache/derby/catalog/types/IndexDescriptorImpl.java > nit: same comment as about about more detail about deferrable indexes. > > some document table describing the use of the following associating it with > deferred and non-deffered and different contraint types might help to make > it clearer: > > boolean isUnique, > boolean isUniqueWithDuplicateNulls, > boolean isUniqueDeferrable, > boolean hasDeferrableChecking, See above. > java/engine/org/apache/derby/iapi/sql/dictionary/IndexRowGenerator.java > nit: same comment about expanding on what isUniqueDeferrable means for the index Fixed, see attached IndexRowGenerator.html. > > java/engine/org/apache/derby/iapi/store/access/SortObserver.java > would be nice for the 3 new interfaces to have javadoc comments for each, > explaing their use in more detail. Added, see SortObserver.html > Index: java/engine/org/apache/derby/impl/sql/execute/AlterConstraintConstantActi > on.java > > o would be good to have tests that alter back and forth from deferred and > not deffered and make sure subsequent actions do the right thing. Could > looks like existing mechanism should work, but seems a good code path to > test. I have one test already and added more in ConstraintCharactericsTest#testBasicDeferral on these lines and below // Try to set immediate mode, and detect violation assertStatementError("23507", s, sCF + " immediate"); // Once more, error above should not roll back assertStatementError("23507", s, sCF + " immediate"); > > Index: java/engine/org/apache/derby/impl/sql/execute/CreateIndexConstantAction.java > o > some comment on why sharing is turned off for deferable constraints would > be good. A user is going to complain at some point when they see duplicate > indexes taking up twice space that they don't think they need. Added a comment. I guess we could introduce sharing in a follow-up patch with any non-unique indexes now that all checking happens in language. > > Index: java/engine/org/apache/derby/impl/sql/execute/DeferredDuplicates.java > > o This code throws SQLState.LANG_DEFERRED_DUPLICATE_KEY_CONSTRAINT_T or > SQLState.LANG_DEFERRED_DUPLICATE_KEY_CONSTRAINT_S. If these are not > caught and remapped can someone comment on if this is the standard. Is it > ok that this type of constrain will throw a different error than previous > constraints? I think it is ok with standard as they all start with > 23xxx. These are indeed thrown. As far as I can understand, the prefix 23 is correct. > can you comment on statement vs transaction level rollback. While thinking > about this I have always assumed transaction level rollback to get the > rows out of the table. I think statement level must be when savepoints > are involved. The _S variant doesn't roll back - the transaction is still active and if the offending rows aren't deleted, the commit will throw the _T variant and roll back. This is tested in ConstraintCharactericsTest#testBasicDeferral.
          Hide
          Dag H. Wanvik added a comment - - edited

          Attaching derby-532-post-scan-3, which fixes two bugs: 1. InsertResultSet didn't test on effective deferred status, just initiallyDeferred. Renamed the variables there and in the sort observers accordingly. 2. Changed the order in which status DEFERRED ALL and the map of individual constraint modes gets propagated to child SQL session contexts (in casu IMPORT_TABLE), so that individual modes doesn't get clobbered when we inherit the ALL state. This bug was exposed when we started checking on effective deferred status in InsertResultSet. Updated the test to include this case.
          Rerunning regressions.

          Show
          Dag H. Wanvik added a comment - - edited Attaching derby-532-post-scan-3, which fixes two bugs: 1. InsertResultSet didn't test on effective deferred status, just initiallyDeferred. Renamed the variables there and in the sort observers accordingly. 2. Changed the order in which status DEFERRED ALL and the map of individual constraint modes gets propagated to child SQL session contexts (in casu IMPORT_TABLE), so that individual modes doesn't get clobbered when we inherit the ALL state. This bug was exposed when we started checking on effective deferred status in InsertResultSet. Updated the test to include this case. Rerunning regressions.
          Hide
          Dag H. Wanvik added a comment -

          Attaching derby-532-post-scan-4, which makes some minor improvements to the tests.

          Show
          Dag H. Wanvik added a comment - Attaching derby-532-post-scan-4, which makes some minor improvements to the tests.
          Hide
          Dag H. Wanvik added a comment -

          Uploading a new version of the functional specification; 1.6. Minor changes only.

          Show
          Dag H. Wanvik added a comment - Uploading a new version of the functional specification; 1.6. Minor changes only.
          Hide
          ASF subversion and git services added a comment -

          Commit 1545394 from Dag H. Wanvik in branch 'code/trunk'
          [ https://svn.apache.org/r1545394 ]

          DERBY-532 Support deferrable constraints

          Patch derby-532-post-scan-4 implements basic support for deferred
          constraints for PRIMARY KEY and UNIQUE constraints. Deferrable
          constraints are not enabled by default yet; one needs to set a
          property to try the feature: "derby.constraintsTesting".

          This patch enables deferred constraints for:

          a) primary key constraints
          b) unique constraint with not nullable columns
          c) unique constraint with nullable columns

          by new logic in insertion and sorts.

          The patch includes relaxing the constraint at insertion and update
          time, as well as adding a constraint to an existing table.

          Derby treats constraints a) and b) the same, and in the code these are
          marked as "unique" when they are not deferrable (as in existing code).

          Constraint type c) is currently marked as
          "uniqueWithDuplicateNulls". Insert/update of these is implemented in
          the BTree by including the RowLocation of the base row in the set of
          keys in the index row (DERBY-3330). This makes them trivially unique,
          but there is an extra code path in BTreeController that checks
          neighbor rows for duplicates, and only allows insertion if the key
          contains a null. When adding a constraint to an existing table, these
          are handled by a specially crafted sorter
          (UniqueWithDuplicateNullsMergeSort).

          The implementation of insert/update of deferrable indexes is based on
          a similar approach, i.e. by backing with a non-unique index, and checking
          duplicates in the language layer, notably IndexChanger.

          In IndexChanger, after inserting a row, we check if it is unique by
          performing a scan of the BTree. A time-out here leads to a pessimistic
          assumption that there is a duplicate. Duplicate key values are saved
          until checking time (usually commit time), when a new scan is
          performed to validate the uniqueness property.

          [This means a) and b) if deferrable are no longer marked "unique"].

          Deferrable indexes are not shared.

          If there are duplicates and we have deferred constraint mode (a
          dynamic session property), we save the duplicate index row in a disk
          based hash table (DeferredDuplicates#rememberDuplicate).

          For a) and b), constraints which are deferrable are marked as
          "uniqueDeferrable" and "hasDeferrableChecking". Constraints of type c)
          which are deferrable are marked "uniqueWithDuplicateNulls" and
          "hasDeferrableChecking". These marks determines the code paths
          used. Note that existing indexes and non-deferrable constraint do not
          get a new code path, which should preserve correctness and performance
          of those.

          Now, with these markers in place, deferral of checks happens in three
          places:

          {{ IndexChanger#insertAndCheckDups}}

          {{CreateIndexConstantAction#executeConstantAction +
          MergeSort#compare and UniqueWithDuplicateNullsMergeSort#compare }}

          InsertResultSet#setUpAllSorts

          The former is active for deferral under INSERT and UPDATE. The middle
          when adding a deferrable constraint to an existing table, when we sort
          existing rows detecting any duplicates. The last is used when importing
          rows.

          At transaction commit (1), or when the constraint mode for a deferred
          constraint is changed back to immediate (2), we validate the
          constraint (DeferredDuplicates#validate) by replaying the hash table
          and scanning the index for the duplicate index rows to ascertain there
          are none, or else we have an error: transaction or statement severity
          respectively for (1) and (2).

          The constraint mode is a SQL session level variable, and inside
          routines (nested connections), we push this on the stack. This means
          change of the constraint mode inside nested connections will be popped
          on routine exit. If, as part of this, a constraint changes from
          deferred to immediate mode, we also validate it for correctness. If
          this fail, the transaction rolls back
          We needed to do this from a newly introduced method,
          GenericLanguageConnectionContext#popNestedSessionContext. This
          pops the SQL session context.
          That hook is called from GenericPreparedStatement#executeStmt. As a
          part of this effort, we also renamed #setupNestedSessionContext to
          #pushNestedSessionContext.

          The patch also adds support for checking deferred constraints in
          xa_prepare and xa_commit (.., true), cf. specification attached to the
          JIRA issue.

          Concurrency: if a transaction gets a lock time-out when trying to
          establish if a row just inserted is a duplicate (another transaction
          may have just inserted a row with a similar index key), we use a
          pessimistics assumption and add that key to the set of keys to be
          checked at commit time. If a key can't be grabbed then, a time-out is
          thrown. We plan to add an optimized scan to avoid waiting for the lock
          at insertion time, cf DERBY-6419.

          The "not enforced" feature is not yet implemented in this patch.

          Several new test cases been added to ConstraintCharacteristicsTest to
          test these basic behaviors.

          Show
          ASF subversion and git services added a comment - Commit 1545394 from Dag H. Wanvik in branch 'code/trunk' [ https://svn.apache.org/r1545394 ] DERBY-532 Support deferrable constraints Patch derby-532-post-scan-4 implements basic support for deferred constraints for PRIMARY KEY and UNIQUE constraints. Deferrable constraints are not enabled by default yet; one needs to set a property to try the feature: "derby.constraintsTesting". This patch enables deferred constraints for: a) primary key constraints b) unique constraint with not nullable columns c) unique constraint with nullable columns by new logic in insertion and sorts. The patch includes relaxing the constraint at insertion and update time, as well as adding a constraint to an existing table. Derby treats constraints a) and b) the same, and in the code these are marked as "unique" when they are not deferrable (as in existing code). Constraint type c) is currently marked as "uniqueWithDuplicateNulls". Insert/update of these is implemented in the BTree by including the RowLocation of the base row in the set of keys in the index row ( DERBY-3330 ). This makes them trivially unique, but there is an extra code path in BTreeController that checks neighbor rows for duplicates, and only allows insertion if the key contains a null. When adding a constraint to an existing table, these are handled by a specially crafted sorter (UniqueWithDuplicateNullsMergeSort). The implementation of insert/update of deferrable indexes is based on a similar approach, i.e. by backing with a non-unique index, and checking duplicates in the language layer, notably IndexChanger. In IndexChanger, after inserting a row, we check if it is unique by performing a scan of the BTree. A time-out here leads to a pessimistic assumption that there is a duplicate. Duplicate key values are saved until checking time (usually commit time), when a new scan is performed to validate the uniqueness property. [This means a) and b) if deferrable are no longer marked "unique"] . Deferrable indexes are not shared. If there are duplicates and we have deferred constraint mode (a dynamic session property), we save the duplicate index row in a disk based hash table (DeferredDuplicates#rememberDuplicate). For a) and b), constraints which are deferrable are marked as "uniqueDeferrable" and "hasDeferrableChecking". Constraints of type c) which are deferrable are marked "uniqueWithDuplicateNulls" and "hasDeferrableChecking". These marks determines the code paths used. Note that existing indexes and non-deferrable constraint do not get a new code path, which should preserve correctness and performance of those. Now, with these markers in place, deferral of checks happens in three places: {{ IndexChanger#insertAndCheckDups}} {{CreateIndexConstantAction#executeConstantAction + MergeSort#compare and UniqueWithDuplicateNullsMergeSort#compare }} InsertResultSet#setUpAllSorts The former is active for deferral under INSERT and UPDATE. The middle when adding a deferrable constraint to an existing table, when we sort existing rows detecting any duplicates. The last is used when importing rows. At transaction commit (1), or when the constraint mode for a deferred constraint is changed back to immediate (2), we validate the constraint (DeferredDuplicates#validate) by replaying the hash table and scanning the index for the duplicate index rows to ascertain there are none, or else we have an error: transaction or statement severity respectively for (1) and (2). The constraint mode is a SQL session level variable, and inside routines (nested connections), we push this on the stack. This means change of the constraint mode inside nested connections will be popped on routine exit. If, as part of this, a constraint changes from deferred to immediate mode, we also validate it for correctness. If this fail, the transaction rolls back We needed to do this from a newly introduced method, GenericLanguageConnectionContext#popNestedSessionContext. This pops the SQL session context. That hook is called from GenericPreparedStatement#executeStmt. As a part of this effort, we also renamed #setupNestedSessionContext to #pushNestedSessionContext. The patch also adds support for checking deferred constraints in xa_prepare and xa_commit (.., true), cf. specification attached to the JIRA issue. Concurrency: if a transaction gets a lock time-out when trying to establish if a row just inserted is a duplicate (another transaction may have just inserted a row with a similar index key), we use a pessimistics assumption and add that key to the set of keys to be checked at commit time. If a key can't be grabbed then, a time-out is thrown. We plan to add an optimized scan to avoid waiting for the lock at insertion time, cf DERBY-6419 . The "not enforced" feature is not yet implemented in this patch. Several new test cases been added to ConstraintCharacteristicsTest to test these basic behaviors.
          Hide
          Dag H. Wanvik added a comment -

          Committed derby-532-post-scan-4 as svn version 1545394.

          Show
          Dag H. Wanvik added a comment - Committed derby-532-post-scan-4 as svn version 1545394.
          Hide
          Dag H. Wanvik added a comment -

          Running an experiment setting all unique and pk constraints as deferrable [1] (but not deferred) for the existing regression tests to validate any differences in behavior.

          [1] by instrumenting the code with the patch derby-532-test-with-default-deferrable-all-over.diff

          Show
          Dag H. Wanvik added a comment - Running an experiment setting all unique and pk constraints as deferrable [1] (but not deferred) for the existing regression tests to validate any differences in behavior. [1] by instrumenting the code with the patch derby-532-test-with-default-deferrable-all-over.diff
          Hide
          Dag H. Wanvik added a comment - - edited

          Here is one fallout of running the regressions with default constraint mode deferrable.

          Uploading a patch ("derby-532-fix-metadata-1") that fixes broken database metadata for deferred constraint indexes: the metadata query used the method IndexDescriptor#isUnique to determine logical uniqueness, but it really represents physical uniqueness now. For deferred unique constraints, the method that should be used is "isUniqueDeferrable". Added a test, and also added client/server run of the regression test for deferred constraints.

          Before the fix, the added test fixture "testDatabaseMetaData" failed in that the index in question was identified as non unique, but it is logically unique.

          When changing the code for the database metadata query, I discovered DERBY-6423.

          Show
          Dag H. Wanvik added a comment - - edited Here is one fallout of running the regressions with default constraint mode deferrable. Uploading a patch ("derby-532-fix-metadata-1") that fixes broken database metadata for deferred constraint indexes: the metadata query used the method IndexDescriptor#isUnique to determine logical uniqueness, but it really represents physical uniqueness now. For deferred unique constraints, the method that should be used is "isUniqueDeferrable". Added a test, and also added client/server run of the regression test for deferred constraints. Before the fix, the added test fixture "testDatabaseMetaData" failed in that the index in question was identified as non unique, but it is logically unique. When changing the code for the database metadata query, I discovered DERBY-6423 .
          Hide
          Dag H. Wanvik added a comment - - edited

          Here are the rest of the failures and errors I saw running the JUnit tests with default deferrable (the experiment mentioned above). I'll be analyzing them to see if they are to be expected or not.

          (Note: this run included the proposed patch for DERBY-6419 as well)

          [Update 2013-12-09 Analysis added for all issues]

          • LangProcedureTest (lock difference)

            => To be expected: we are running with serializable isolation mode
            and the index used is no longer physically unique, so the
            query reads the previous key also. Locks expected in the non deferred case:

              XID,TYPE,MODE,TABLENAME,LOCKNAME,STATE,TABLETYPE,LOCKCOUNT,INDEXNAME
              --- ---- ---- --------- -------- ----- --------- --------- ---------
              {345,TABLE,IS,T1,Tablelock,GRANT,T,2,null}
              {345,ROW,S,T1,(1,8),GRANT,T,1,null}
              

            what we see in the deferrable case is:

              XID,TYPE,MODE,TABLENAME,LOCKNAME,STATE,TABLETYPE,LOCKCOUNT,INDEXNAME
              --- ---- ---- --------- -------- ----- --------- --------- ---------
              {503,TABLE,IS,T1,Tablelock,GRANT,T,2,null}
              {503,ROW,S,T1,(1,7),GRANT,T,1,null}
              {503,ROW,S,T1,(1,8),GRANT,T,1,null}
              
          • LangScripts update (lock difference)

            => To be expected. We have this table:

               create table tab1 (c1 int not null primary key, c2 int)
               insert into tab1 values (1, 8)
             

            and then we do:

               update tab1 set c2 = c2 + 3 where c1 = 1;
             

            Read committed, deferrable

              XID,TYPE,MODE,TABLENAME,LOCKNAME,STATE,TABLETYPE,LOCKCOUNT,INDEXNAME
              --- ---- ---- --------- -------- ----- --------- --------- ---------
              {184,TABLE,IX,TAB1,Tablelock,GRANT,T,2,null}
              {184,ROW,X,TAB1,(1,7),GRANT,T,3,null}
             

            Serializable , deferrable

              XID,TYPE,MODE,TABLENAME,LOCKNAME,STATE,TABLETYPE,LOCKCOUNT,INDEXNAME
              --- ---- ---- --------- -------- ----- --------- --------- ---------
             {186,TABLE,IX,TAB1,Tablelock,GRANT,T,2,null}
             {186,ROW,X,TAB1,(1,3),GRANT,T,1,null}
             {186,ROW,X,TAB1,(1,7),GRANT,T,3,null}
             

            i.e. we also have a lock on the first (control) row (before row
            containing 1). In the non-deferrable case, again we have a unique
            physical index on c1, so a previous lock is not required. Note, for
            read committed there is no difference.

          • LojReorderTest (different execution plan)

            => To be expected. The original plan looked like this:

            	     
                 "Sort ResultSet:",
                 "Source result set:",
                 "_Hash Left Outer Join ResultSet:",
                 "_Left result set:",
                 "__Nested Loop Left Outer Join ResultSet:",
                 "__Left result set:",
                 "___Nested Loop Left Outer Join ResultSet:",
                 "___Left result set:",
                 "____Index Scan ResultSet for A",
                 "___Right result set:",
                 "____Index Scan ResultSet for B",
                 "__Right result set:",
                 "___Index Scan ResultSet for C",
                 "_Right result set:",
                 "__Hash Scan ResultSet for D"});
              

            With deferrable constraints it looks like this:

                "Sort ResultSet:",
                "Source result set:",
                "_Hash Left Outer Join ResultSet:",
                "_Left result set:",
                "__Nested Loop Left Outer Join ResultSet:",
                "__Left result set:",
                "___Hash Left Outer Join ResultSet:",
                "___Left result set:",
                "____Index Scan ResultSet for A",
                "___Right result set:",
                "____Hash Scan ResultSet for B",
                "__Right result set:",
                "___Index Scan ResultSet for C",
                "_Right result set:",
                "__Hash Scan ResultSet for D"});
              

            As we can see, the deferrable the inner (leftmost) LOJ uses as Hash
            left order join with and index scan for A and a hash scan for B,
            whereas the non deferrable case uses nested loop over two index
            scans for A and B, presumably because the physical uniqeness of the
            index on B gives the optimizer another cardinality estimate. In any
            case both plans are good.

          • InListMultiProbeTest (assertTrue(rtsp.usedIndexScan()))

            Test case testDerby6045WithoutUpdateStatistics fails.
            In this case, Derby picks a table scan instead of an index scan for multi
            probe result set. I see using Rick's tools to trace the optimizer that the
            selectivity for the start stop predicate on TERM_ID is increased from 0.1 in
            the no deferrable case to 1.0 in the deferrable case.

              s.executeQuery("SELECT  *  FROM  " + whiteSpace + 
                                DERBY_6045_DATA_TABLE + 
                                " WHERE (TERM_ID = 11) OR " +
                                "(TERM_ID =21) OR (TERM_ID = 31)");
              

            Presumably this makes the optimizer discard the index scan in favor of a table scan, cf. this quote from DERBY-47:

              So in the context of row counts, if the number of IN-list predicates
              multiplied by the estimated row count (after stat selectivity is applied)
              yields a high precentage row count (ex. all rows in the table) then the odds
              of the optimizer choosing to use that particular index are lower. It may
              still choose to use the index, in which case multi-probing will take effect,
              but it probably will not (it all depends).
              

            Perhaps this could be improved upon (should?). For a "normal" non unique index this may be
            the right choice but it does seem a bit pessimistic for the deferrable case.

          • CollationTest2 (java.sql.SQLIntegrityConstraintViolationException:
            The statement was aborted because it would have caused a duplicate
            key value in a unique or primary key constraint or unique index
            identified by 'SQL131129120107383' defined on 'DERBY_5367'.)

            Not an issue with last patch.

          • ConglomerateSharingTest (number of physical conglomerates that exist
            for the received table: Expected: >3< Found: >4<

            We have the following scenario:

                    create table dropc_t2 (a int, b int not null, c int not null)
                    create index dropc_ix1 on dropc_t2 (a,b)
                    create unique index dropc_uix2 on dropc_t2 (c)
                    alter table dropc_t2 add constraint dropc_uc1 unique (c)
                    alter table dropc_t2 add constraint dropc_fk0 
                        foreign key (a,b) references dropc_t0
                    alter table dropc_t2 add constraint dropc_fk1 
                        foreign key (a,b) references dropc_t0
                    alter table dropc_t2 add constraint "dropc_fk2 
                        foreign key (c) references dropc_t1
            
                    /* Should have 3 conglomerates on DROPC_T2:
                     *
                     *  1. Heap
                     *  2. DROPC_IX1 (shared by: DROPC_FK0, DROPC_FK1)
                     *  3. DROPC_UIX2 (shared by: DROPC_UC1, DROPC_FK2)
                     */
              

            Here the unique constraint DROPC_UC1 is deferrable by default, so it doesn't
            share the unique index DROPC_UIX2. This is as designed.

          • NullableUniqueConstraintTest (junit.framework.ComparisonFailure:
            Unexpected SQL state. expected:<[23505]> but was:<[XJ001]>)
            if (SanityManager.DEBUG) { // deferrable: we use a non-unique index SanityManager.ASSERT( insertStatus != ConglomerateController.ROWISDUPLICATE); <=====!! }

            ++ more

            To be expected. This fails in the test case testMixedInsertDelete
            which mentions an earlier problem with this fixture:
            DERBY-4097. Since we are now running a BTree scan to check for the
            duplicate instead of the check on the left sibling as done for
            unique not null in the non deferrable case, we don't get to retry if
            we can't lock the row. We lose here because we timeout trying to get
            a lock for the same reason as described in DERBY-4097; a conflict
            with the post commit work done by another thread. It happens in
            iteration four. As discussed in the thread, the issue can be avoided
            by turning off autocommit. Another thing is that in this case, it
            would be better to report the time-out rather report it as a
            duplicate I think: since we are running with immediate checking.

          • UniqueConstraintSetNullTest
            (java.sql.SQLIntegrityConstraintViolationException: The statement
            was aborted because it would have caused a duplicate key value in a
            unique or primary key constraint or unique index identified by
            'U_CON' defined on 'CONSTRAINTEST'.)
            ++ more

            This exposed a bug: a broken predicate when recreating the index
            when going from UNQIUE NOT NULL to plain UNIQUE: the existing test
            missed the deferrable case, so the index was not recreated.
            Fixed in attached patch derby-532-fix-drop-not-nullable, which also
            adds a test case for this.

          • UniqueConstraintMultiThreadedTest
            (junit.framework.AssertionFailedError: isolation levels: 1 1)

            This exposed a bug with the introduction of the no-wait scan, cf.
            DERBY-6419: we should only return immediately from the checking scan
            if we hit a timeout or deadlock iff the constraint mode is deferred.
            For immediate deferrable constraints, the check should wait obey the
            usual timeout values. This is fixed with a follow-up patch
            to DERBY-6419.

          • XplainStatisticsTest ( expected rows:
            [[COUNTRIES_UNQ_NM, C, BTREE, RC, 1, 1, 1, SH, R, 2, ALL]]
            actual result:
            [[COUNTRIES_UNQ_NM, C, BTREE, RC, 2, 1, 1, IS, R, 2, ALL]])

            The difference here as far as I can see consists of two items:

            • two rows are read from via the index instead of one (ok I think)
            • the lock mode is "instantaneous"; in the non-deferrable case I see
              the following in the execution plan:
                  Index Scan ResultSet for COUNTRIES using constraint COUNTRIES_UNQ_NM at
                  read committed isolation level using share row locking chosen by the optimizer
                  

              whereas in the deferrable case I see:

                  Index Scan ResultSet for COUNTRIES using constraint COUNTRIES_UNQ_NM at 
                  read committed isolation level using instantaneous share row locking chosen by the optimizer
                  

              I presume this is because we are now using a non-unique scan, but
              I am not 100% sure. As for now I'll treat this as an "expected difference".

          • dml019(org.apache.derbyTesting.functionTests.tests.nist.NistScripts)
            failed: junit.framework.ComparisonFailure: Output at line 85
            expected:<E1 |P[1 |4]0 > but was:<E1 |P[2 |2]0 >

            To be expected: the following statement gave the correct rows, but the ordering in the result set was different from the master due to a different execution plan.

               SELECT PNUM,EMPNUM                      
                      FROM WORKS                              
                      GROUP BY EMPNUM,PNUM,HOURS;
              
          • UpdateStatisticsTest (junit.framework.AssertionFailedError: failed
            to get statistics for table TEST_TAB_1 (#expected=0, timeout=0)
            Index statistics for TEST_TAB_1 1:
            {tableId=2f2b17ef-0142-a377-c079-0000766f8c2f, tableName=TEST_TAB_1, indexName=SQL131129120747740, lcols=1, rows=2, unique/card=2, created=2013-11-29 12:07:47.742}

            expected:<0> but was:<1>)

            To be expected: There were four errors here, and all were instances of the same issue: we found statistics where none were expected due to DERBY-3790 " Investigate if request for update statistics can be skipped for certain kind of indexes, one instance may be unique indexes based on one column.". Since primary key indexes are not physically unique with deferrable indexes, the skipping of the statistics generation did not happen.

          Show
          Dag H. Wanvik added a comment - - edited Here are the rest of the failures and errors I saw running the JUnit tests with default deferrable (the experiment mentioned above). I'll be analyzing them to see if they are to be expected or not. (Note: this run included the proposed patch for DERBY-6419 as well) [Update 2013-12-09 Analysis added for all issues] LangProcedureTest (lock difference) => To be expected: we are running with serializable isolation mode and the index used is no longer physically unique, so the query reads the previous key also. Locks expected in the non deferred case: XID,TYPE,MODE,TABLENAME,LOCKNAME,STATE,TABLETYPE,LOCKCOUNT,INDEXNAME --- ---- ---- --------- -------- ----- --------- --------- --------- {345,TABLE,IS,T1,Tablelock,GRANT,T,2,null} {345,ROW,S,T1,(1,8),GRANT,T,1,null} what we see in the deferrable case is: XID,TYPE,MODE,TABLENAME,LOCKNAME,STATE,TABLETYPE,LOCKCOUNT,INDEXNAME --- ---- ---- --------- -------- ----- --------- --------- --------- {503,TABLE,IS,T1,Tablelock,GRANT,T,2,null} {503,ROW,S,T1,(1,7),GRANT,T,1,null} {503,ROW,S,T1,(1,8),GRANT,T,1,null} LangScripts update (lock difference) => To be expected. We have this table: create table tab1 (c1 int not null primary key, c2 int) insert into tab1 values (1, 8) and then we do: update tab1 set c2 = c2 + 3 where c1 = 1; Read committed, deferrable XID,TYPE,MODE,TABLENAME,LOCKNAME,STATE,TABLETYPE,LOCKCOUNT,INDEXNAME --- ---- ---- --------- -------- ----- --------- --------- --------- {184,TABLE,IX,TAB1,Tablelock,GRANT,T,2,null} {184,ROW,X,TAB1,(1,7),GRANT,T,3,null} Serializable , deferrable XID,TYPE,MODE,TABLENAME,LOCKNAME,STATE,TABLETYPE,LOCKCOUNT,INDEXNAME --- ---- ---- --------- -------- ----- --------- --------- --------- {186,TABLE,IX,TAB1,Tablelock,GRANT,T,2,null} {186,ROW,X,TAB1,(1,3),GRANT,T,1,null} {186,ROW,X,TAB1,(1,7),GRANT,T,3,null} i.e. we also have a lock on the first (control) row (before row containing 1). In the non-deferrable case, again we have a unique physical index on c1, so a previous lock is not required. Note, for read committed there is no difference. LojReorderTest (different execution plan) => To be expected. The original plan looked like this: "Sort ResultSet:", "Source result set:", "_Hash Left Outer Join ResultSet:", "_Left result set:", "__Nested Loop Left Outer Join ResultSet:", "__Left result set:", "___Nested Loop Left Outer Join ResultSet:", "___Left result set:", "____Index Scan ResultSet for A", "___Right result set:", "____Index Scan ResultSet for B", "__Right result set:", "___Index Scan ResultSet for C", "_Right result set:", "__Hash Scan ResultSet for D"}); With deferrable constraints it looks like this: "Sort ResultSet:", "Source result set:", "_Hash Left Outer Join ResultSet:", "_Left result set:", "__Nested Loop Left Outer Join ResultSet:", "__Left result set:", "___Hash Left Outer Join ResultSet:", "___Left result set:", "____Index Scan ResultSet for A", "___Right result set:", "____Hash Scan ResultSet for B", "__Right result set:", "___Index Scan ResultSet for C", "_Right result set:", "__Hash Scan ResultSet for D"}); As we can see, the deferrable the inner (leftmost) LOJ uses as Hash left order join with and index scan for A and a hash scan for B, whereas the non deferrable case uses nested loop over two index scans for A and B, presumably because the physical uniqeness of the index on B gives the optimizer another cardinality estimate. In any case both plans are good. InListMultiProbeTest (assertTrue(rtsp.usedIndexScan())) Test case testDerby6045WithoutUpdateStatistics fails. In this case, Derby picks a table scan instead of an index scan for multi probe result set. I see using Rick's tools to trace the optimizer that the selectivity for the start stop predicate on TERM_ID is increased from 0.1 in the no deferrable case to 1.0 in the deferrable case. s.executeQuery("SELECT * FROM " + whiteSpace + DERBY_6045_DATA_TABLE + " WHERE (TERM_ID = 11) OR " + "(TERM_ID =21) OR (TERM_ID = 31)"); Presumably this makes the optimizer discard the index scan in favor of a table scan, cf. this quote from DERBY-47 : So in the context of row counts, if the number of IN-list predicates multiplied by the estimated row count (after stat selectivity is applied) yields a high precentage row count (ex. all rows in the table) then the odds of the optimizer choosing to use that particular index are lower. It may still choose to use the index, in which case multi-probing will take effect, but it probably will not (it all depends). Perhaps this could be improved upon (should?). For a "normal" non unique index this may be the right choice but it does seem a bit pessimistic for the deferrable case. CollationTest2 (java.sql.SQLIntegrityConstraintViolationException: The statement was aborted because it would have caused a duplicate key value in a unique or primary key constraint or unique index identified by 'SQL131129120107383' defined on 'DERBY_5367'.) Not an issue with last patch. ConglomerateSharingTest (number of physical conglomerates that exist for the received table: Expected: >3< Found: >4< We have the following scenario: create table dropc_t2 (a int, b int not null, c int not null) create index dropc_ix1 on dropc_t2 (a,b) create unique index dropc_uix2 on dropc_t2 (c) alter table dropc_t2 add constraint dropc_uc1 unique (c) alter table dropc_t2 add constraint dropc_fk0 foreign key (a,b) references dropc_t0 alter table dropc_t2 add constraint dropc_fk1 foreign key (a,b) references dropc_t0 alter table dropc_t2 add constraint "dropc_fk2 foreign key (c) references dropc_t1 /* Should have 3 conglomerates on DROPC_T2: * * 1. Heap * 2. DROPC_IX1 (shared by: DROPC_FK0, DROPC_FK1) * 3. DROPC_UIX2 (shared by: DROPC_UC1, DROPC_FK2) */ Here the unique constraint DROPC_UC1 is deferrable by default, so it doesn't share the unique index DROPC_UIX2. This is as designed. NullableUniqueConstraintTest (junit.framework.ComparisonFailure: Unexpected SQL state. expected:< [23505] > but was:< [XJ001] >) if (SanityManager.DEBUG) { // deferrable: we use a non-unique index SanityManager.ASSERT( insertStatus != ConglomerateController.ROWISDUPLICATE); <=====!! } ++ more To be expected. This fails in the test case testMixedInsertDelete which mentions an earlier problem with this fixture: DERBY-4097 . Since we are now running a BTree scan to check for the duplicate instead of the check on the left sibling as done for unique not null in the non deferrable case, we don't get to retry if we can't lock the row. We lose here because we timeout trying to get a lock for the same reason as described in DERBY-4097 ; a conflict with the post commit work done by another thread. It happens in iteration four. As discussed in the thread, the issue can be avoided by turning off autocommit. Another thing is that in this case, it would be better to report the time-out rather report it as a duplicate I think: since we are running with immediate checking. UniqueConstraintSetNullTest (java.sql.SQLIntegrityConstraintViolationException: The statement was aborted because it would have caused a duplicate key value in a unique or primary key constraint or unique index identified by 'U_CON' defined on 'CONSTRAINTEST'.) ++ more This exposed a bug: a broken predicate when recreating the index when going from UNQIUE NOT NULL to plain UNIQUE: the existing test missed the deferrable case, so the index was not recreated. Fixed in attached patch derby-532-fix-drop-not-nullable, which also adds a test case for this. UniqueConstraintMultiThreadedTest (junit.framework.AssertionFailedError: isolation levels: 1 1) This exposed a bug with the introduction of the no-wait scan, cf. DERBY-6419 : we should only return immediately from the checking scan if we hit a timeout or deadlock iff the constraint mode is deferred. For immediate deferrable constraints, the check should wait obey the usual timeout values. This is fixed with a follow-up patch to DERBY-6419 . XplainStatisticsTest ( expected rows: [ [COUNTRIES_UNQ_NM, C, BTREE, RC, 1, 1, 1, SH, R, 2, ALL] ] actual result: [ [COUNTRIES_UNQ_NM, C, BTREE, RC, 2, 1, 1, IS, R, 2, ALL] ]) The difference here as far as I can see consists of two items: two rows are read from via the index instead of one (ok I think) the lock mode is "instantaneous"; in the non-deferrable case I see the following in the execution plan: Index Scan ResultSet for COUNTRIES using constraint COUNTRIES_UNQ_NM at read committed isolation level using share row locking chosen by the optimizer whereas in the deferrable case I see: Index Scan ResultSet for COUNTRIES using constraint COUNTRIES_UNQ_NM at read committed isolation level using instantaneous share row locking chosen by the optimizer I presume this is because we are now using a non-unique scan, but I am not 100% sure. As for now I'll treat this as an "expected difference". dml019(org.apache.derbyTesting.functionTests.tests.nist.NistScripts) failed: junit.framework.ComparisonFailure: Output at line 85 expected:<E1 |P [1 |4] 0 > but was:<E1 |P [2 |2] 0 > To be expected: the following statement gave the correct rows, but the ordering in the result set was different from the master due to a different execution plan. SELECT PNUM,EMPNUM FROM WORKS GROUP BY EMPNUM,PNUM,HOURS; UpdateStatisticsTest (junit.framework.AssertionFailedError: failed to get statistics for table TEST_TAB_1 (#expected=0, timeout=0) Index statistics for TEST_TAB_1 1: {tableId=2f2b17ef-0142-a377-c079-0000766f8c2f, tableName=TEST_TAB_1, indexName=SQL131129120747740, lcols=1, rows=2, unique/card=2, created=2013-11-29 12:07:47.742} expected:<0> but was:<1>) To be expected: There were four errors here, and all were instances of the same issue: we found statistics where none were expected due to DERBY-3790 " Investigate if request for update statistics can be skipped for certain kind of indexes, one instance may be unique indexes based on one column.". Since primary key indexes are not physically unique with deferrable indexes, the skipping of the statistics generation did not happen.
          Hide
          Dag H. Wanvik added a comment -

          Uploading two patches to fix fallout from the regressions run with default deferrable constraints: derby-532-nullableUniqueFix and derby-532-fix-drop-not-nullable. New test cases were added. Rerunning regressions.

          Show
          Dag H. Wanvik added a comment - Uploading two patches to fix fallout from the regressions run with default deferrable constraints: derby-532-nullableUniqueFix and derby-532-fix-drop-not-nullable. New test cases were added. Rerunning regressions.
          Hide
          Dag H. Wanvik added a comment -

          Regressions ran ok with the two latest patches.

          Show
          Dag H. Wanvik added a comment - Regressions ran ok with the two latest patches.
          Hide
          Dag H. Wanvik added a comment -

          Uploading derby-532-test-speedup. Quote from my sandbox commit:

          Patch derby-532-test-speedup changes ConstraintCharacteristicsTest to
          use a main memory database for some tests for increased speed. It also
          changes the way SystemPropertyTestSetup for static properties closes
          down the database to not deregister the driver; without this change we
          saw a test setup try to connect via the client driver to a Derby server
          engine without a registered driver.

          Regressions ran OK with this patch.

          Show
          Dag H. Wanvik added a comment - Uploading derby-532-test-speedup. Quote from my sandbox commit: Patch derby-532-test-speedup changes ConstraintCharacteristicsTest to use a main memory database for some tests for increased speed. It also changes the way SystemPropertyTestSetup for static properties closes down the database to not deregister the driver; without this change we saw a test setup try to connect via the client driver to a Derby server engine without a registered driver. Regressions ran OK with this patch.
          Hide
          ASF subversion and git services added a comment -

          Commit 1550113 from Dag H. Wanvik in branch 'code/trunk'
          [ https://svn.apache.org/r1550113 ]

          DERBY-532 Support deferrable constraints

          A patch ("derby-532-fix-metadata-1") that fixes broken database
          metadata for deferred constraint indexes: the metadata query used the
          method IndexDescriptor#isUnique to determine logical uniqueness, but
          it really represents physical uniqueness now. For deferred unique
          constraints, the method that should be used is
          "isUniqueDeferrable". Added a test, and also added client/server run
          of the regression test for deferred constraints.

          Before the fix, the added test fixture "testDatabaseMetaData" failed
          in that the index in question was identified as non unique, but it is
          logically unique and so should be reported as such.

          Show
          ASF subversion and git services added a comment - Commit 1550113 from Dag H. Wanvik in branch 'code/trunk' [ https://svn.apache.org/r1550113 ] DERBY-532 Support deferrable constraints A patch ("derby-532-fix-metadata-1") that fixes broken database metadata for deferred constraint indexes: the metadata query used the method IndexDescriptor#isUnique to determine logical uniqueness, but it really represents physical uniqueness now. For deferred unique constraints, the method that should be used is "isUniqueDeferrable". Added a test, and also added client/server run of the regression test for deferred constraints. Before the fix, the added test fixture "testDatabaseMetaData" failed in that the index in question was identified as non unique, but it is logically unique and so should be reported as such.
          Hide
          ASF subversion and git services added a comment -

          Commit 1550152 from Dag H. Wanvik in branch 'code/trunk'
          [ https://svn.apache.org/r1550152 ]

          DERBY-532 Support deferrable constraints

          Patch derby-532-nullableUniqueFix. When we changed the implementation from
          special treatment of deferrable constraints in the BTree, a couple of extra
          predicates needed to be added were omitted - added those here: we should not
          mark the physical index with "uniqueWithDuplicateNulls" if it is deferrable.
          This error was found when running the regressions with default deferrable for
          all pk and unique constraints.

          We also removed an unused flag "hasDeferrableChecking" in the same places (it is
          not longer used by the physical index).

          Added a new test case, testCompressTable, which tests the
          "uniqueWithDuplicateNulls" case.

          We also change the behavior in the following way for deferrable, but not
          deferred constraints: if we hit a time-out or dead-lock when checking uniqueness
          (in the BTree scan), we throw that time-out or dead-lock. Up till now we
          converted it to a duplicate exception. We will only assume it can be a duplicate

          • for later checking - iff the constraint mode is deferrable.
          Show
          ASF subversion and git services added a comment - Commit 1550152 from Dag H. Wanvik in branch 'code/trunk' [ https://svn.apache.org/r1550152 ] DERBY-532 Support deferrable constraints Patch derby-532-nullableUniqueFix. When we changed the implementation from special treatment of deferrable constraints in the BTree, a couple of extra predicates needed to be added were omitted - added those here: we should not mark the physical index with "uniqueWithDuplicateNulls" if it is deferrable. This error was found when running the regressions with default deferrable for all pk and unique constraints. We also removed an unused flag "hasDeferrableChecking" in the same places (it is not longer used by the physical index). Added a new test case, testCompressTable, which tests the "uniqueWithDuplicateNulls" case. We also change the behavior in the following way for deferrable, but not deferred constraints: if we hit a time-out or dead-lock when checking uniqueness (in the BTree scan), we throw that time-out or dead-lock. Up till now we converted it to a duplicate exception. We will only assume it can be a duplicate for later checking - iff the constraint mode is deferrable.
          Hide
          ASF subversion and git services added a comment -

          Commit 1550284 from Dag H. Wanvik in branch 'code/trunk'
          [ https://svn.apache.org/r1550284 ]

          DERBY-532 Support deferrable constraints

          Patch derby-532-fix-drop-not-nullable. Fixes a broken predicate when
          recreating the index when going from UNIQUE NOT NULL to plain UNIQUE:
          the existing predicate missed the deferrable case, so the index was not
          recreated.

          Added a test case.

          Show
          ASF subversion and git services added a comment - Commit 1550284 from Dag H. Wanvik in branch 'code/trunk' [ https://svn.apache.org/r1550284 ] DERBY-532 Support deferrable constraints Patch derby-532-fix-drop-not-nullable. Fixes a broken predicate when recreating the index when going from UNIQUE NOT NULL to plain UNIQUE: the existing predicate missed the deferrable case, so the index was not recreated. Added a test case.
          Hide
          ASF subversion and git services added a comment -

          Commit 1550299 from Dag H. Wanvik in branch 'code/trunk'
          [ https://svn.apache.org/r1550299 ]

          DERBY-6419 Make BTree scan honor OPENMODE_LOCK_NOWAIT for row locks

          A follow-up patch: derby-6419-followup.

          Only short circuit waiting for lock in BTree scan to check duplicates
          for a deferred unique/pk constraint if constraint mode is deferred
          (i.e. not if immediate).

          Added a test case lifted from UniqueConstraintMultiThreadedTest, which
          exposed the issue when we run the regressions with default deferrable
          by default (see DERBY-532).

          Show
          ASF subversion and git services added a comment - Commit 1550299 from Dag H. Wanvik in branch 'code/trunk' [ https://svn.apache.org/r1550299 ] DERBY-6419 Make BTree scan honor OPENMODE_LOCK_NOWAIT for row locks A follow-up patch: derby-6419-followup. Only short circuit waiting for lock in BTree scan to check duplicates for a deferred unique/pk constraint if constraint mode is deferred (i.e. not if immediate). Added a test case lifted from UniqueConstraintMultiThreadedTest, which exposed the issue when we run the regressions with default deferrable by default (see DERBY-532 ).
          Hide
          ASF subversion and git services added a comment -

          Commit 1550308 from Dag H. Wanvik in branch 'code/trunk'
          [ https://svn.apache.org/r1550308 ]

          DERBY-532 Support deferrable constraints

          Patch derby-532-test-speedup changes ConstraintCharacteristicsTest to
          use a main memory database for some tests for increased speed. It also
          changes the way SystemPropertyTestSetup for static properties closes
          down the database to not deregister the driver; without this change we
          saw a test setup try to connect via the client driver to a Derby server
          engine without a registered driver.

          Show
          ASF subversion and git services added a comment - Commit 1550308 from Dag H. Wanvik in branch 'code/trunk' [ https://svn.apache.org/r1550308 ] DERBY-532 Support deferrable constraints Patch derby-532-test-speedup changes ConstraintCharacteristicsTest to use a main memory database for some tests for increased speed. It also changes the way SystemPropertyTestSetup for static properties closes down the database to not deregister the driver; without this change we saw a test setup try to connect via the client driver to a Derby server engine without a registered driver.
          Hide
          Dag H. Wanvik added a comment -

          Uploading patch derby-532-allow-pk-unique-1, which opens up for using deferrable constraints for primary key and unique constraints, i.e. it is no longer required that the special property "derby.constraintsTesting" be used for those constraints, since the implementation is complete modulo bugs. Upgrade tests still remain to be built, though.

          For foreign key and check constraints as well as for "not enforced", the property will still required till the implementation for those is completed.

          Show
          Dag H. Wanvik added a comment - Uploading patch derby-532-allow-pk-unique-1 , which opens up for using deferrable constraints for primary key and unique constraints, i.e. it is no longer required that the special property "derby.constraintsTesting" be used for those constraints, since the implementation is complete modulo bugs. Upgrade tests still remain to be built, though. For foreign key and check constraints as well as for "not enforced" , the property will still required till the implementation for those is completed.
          Hide
          ASF subversion and git services added a comment -

          Commit 1555006 from Dag H. Wanvik in branch 'code/trunk'
          [ https://svn.apache.org/r1555006 ]

          DERBY-532 Support deferrable constraints

          Patch derby-532-allow-pk-unique-1, which opens up for using deferrable
          constraints for primary key and unique constraints, i.e. it is no
          longer required that the special property "derby.constraintsTesting"
          be used for those constraints, since the implementation is complete
          modulo bugs. Upgrade tests still remain to be built, though.

          For foreign key and check constraints as well as for "not enforced",
          the property will still required till the implementation for those is
          completed.

          Show
          ASF subversion and git services added a comment - Commit 1555006 from Dag H. Wanvik in branch 'code/trunk' [ https://svn.apache.org/r1555006 ] DERBY-532 Support deferrable constraints Patch derby-532-allow-pk-unique-1, which opens up for using deferrable constraints for primary key and unique constraints, i.e. it is no longer required that the special property "derby.constraintsTesting" be used for those constraints, since the implementation is complete modulo bugs. Upgrade tests still remain to be built, though. For foreign key and check constraints as well as for "not enforced", the property will still required till the implementation for those is completed.
          Hide
          Dag H. Wanvik added a comment -

          Uploading a patch with upgrade tests: derby-532-upgrade-1. It checks that deferrable constraints cannot be used unless hard upgrade has happened.

          Show
          Dag H. Wanvik added a comment - Uploading a patch with upgrade tests: derby-532-upgrade-1 . It checks that deferrable constraints cannot be used unless hard upgrade has happened.
          Hide
          Dag H. Wanvik added a comment -

          Uploading derby-532-upgrade-1b which slightly changes the upgrade test: we only run the test with upgrades from >= 10.4 since nullable unique constraint wasn't added before that release. Regressions now passed with all old version jars present.

          Show
          Dag H. Wanvik added a comment - Uploading derby-532-upgrade-1b which slightly changes the upgrade test: we only run the test with upgrades from >= 10.4 since nullable unique constraint wasn't added before that release. Regressions now passed with all old version jars present.
          Hide
          ASF subversion and git services added a comment -

          Commit 1555724 from Dag H. Wanvik in branch 'code/trunk'
          [ https://svn.apache.org/r1555724 ]

          DERBY-532 Support deferrable constraints

          Patch derby-532-upgrade-1b. It checks that deferrable constraints
          cannot be used unless hard upgrade has happened.

          Show
          ASF subversion and git services added a comment - Commit 1555724 from Dag H. Wanvik in branch 'code/trunk' [ https://svn.apache.org/r1555724 ] DERBY-532 Support deferrable constraints Patch derby-532-upgrade-1b . It checks that deferrable constraints cannot be used unless hard upgrade has happened.
          Hide
          Dag H. Wanvik added a comment - - edited

          Uploading a first patch derby-532-check-constraints-1 which implements deferred check constraints and supporting tests. Comments are welcome.

          The high level approach is as follows. When a violation occurs, we note the row location in the base table of the offending row. At commit time (or when switching a constraint to immediate), we revisit those rows using the row locations if they are still valid, and validate those rows again. This is achieved by positioning to the saved row locations in combination with a specially crafted result set: ValidateCheckConstraintResultSet (see ProjectRestrictResultSet#getNextRowCore) which positions to the offending base row using ValidateCheckConstraintResultSet#positionScanAtRowLocation before letting ValidateCheckConstraintResultSet read the row. If the row locations are no longer valid, e.g. an intervening compress happened, we do a full table scan to verify the constraints instead.

          Adding a constraint in deferred constraint mode is currently sub-optimal, since we currently do a full table scan via an internally generated "SELECT .. WHERE NOT <constraints>", and we don't have a way the get at the row locations of the offending rows in this case. I might add a specially tailored result set for that purpose later.

          Normally, when a row is inserted or updated, we execute a generated method which combines evaluation of all check constraints on the table relevant for the inserted or updated columns. This evaluation is performed using McCarthy boolean evaluation (short-circuits as soon as result is known). This isn't optimal for deferred constraints, as we'd need to assume all constraints were violated in such a case. The implementation replaces the short-circuited evaluation with a full evaluation, so we can remember exactly which constraints were violated, cf. AndNoShortCircuitNode and SQLBoolean#throwExceptionIfImmediateAndFalse. A violation in throwExceptionIfImmediateAndFalse when we have a deferred constraint is noted (DMLWriteResultSet#rememberConstraint implemented by UpdateResultSet and InsertResultSet) by adding the violation to a list for that row. After the insert/update is completed, the set of violations is remembered for posterity, cf. InsertResultSet#normalInsertCode and UpdateResultSet#collectAffectedRows by inspecting the lists (#violatingCheckConstraints).

          Note that we currently do not note which constraints were violated for each individual row, only per table in the transaction. This means that we visit potentially more rows over again when a single constraint is changed to immediate. This could be improved further by storing the set of violated constraints along with the row location.

          For bulk insert and deferred (see panel 1 below) insert row processing there is special code paths, cf. InsertResultSet#offendingRowLocation which is invoked via a callback from HeapController#load and another path in InsertResultSet#normalInsertCode respectively.

          For update, the code for deferred treatment is in in one of UpdateResultSet#collectAffectedRows and UpdateResultSet#updateDeferredRows depending on whether there are triggers.

          The existing test ConstraintCharacteristcsTest has been built out by adding check constraint to those fixture for which it is relevant, as well as adding new ones which are only relevant for check constraints.

          1 This "deferred" refers to Derby special handling of rows in certain situation, for example when doing an insert which uses the same table as a source result set, we need to make sure we don't get confused and see the incrementally inserted rows "again" as we process the original result set, essentially we do a snapshot of the source result set, hence "deferred rows".

          All regressions passed.

          Detailed code comments:

          M java/engine/org/apache/derby/iapi/sql/conn/SQLSessionContext.java
          M java/engine/org/apache/derby/impl/sql/conn/SQLSessionContextImpl.java
          M java/engine/org/apache/derby/iapi/sql/conn/LanguageConnectionContext.java
          M java/engine/org/apache/derby/impl/sql/conn/GenericLanguageConnectionContext.java
          D java/engine/org/apache/derby/impl/sql/execute/DeferredDuplicates.java
          A java/engine/org/apache/derby/impl/sql/execute/DeferredConstraintsMemory.java

          Extended and refactored slightly existing mechanism for deferred primary key/unique constraints to also cater for check constraints. Since the hash key we used for the memory of primary key and unique constraints was the conglomerate id of the indexes, and those are guaranteed to be disjoint from the conglomerate ids of the base tables having deferred constraints, we can use the same hash table to find the "memory" in the form of the disk based hash table (BackingStoreHashtable), cf. LCC#getDeferredHashTables.--

          M java/engine/org/apache/derby/iapi/sql/dictionary/ConstraintDescriptor.java-

          Code to drop any deferred constraints memory in the transaction when a constraint is dropped.-

          M java/engine/org/apache/derby/impl/store/access/heap/HeapController.java

          Call back added for bulk insert in the presence of deferrable check constraints.

          M java/engine/org/apache/derby/iapi/sql/execute/NoPutResultSet.java
          M java/engine/org/apache/derby/impl/sql/execute/NoPutResultSetImpl.java
          M java/engine/org/apache/derby/iapi/store/access/RowLocationRetRowSource.java

          Extra plumbing to be able to signal to HeapController that we need to do a callback with the inserted row location (for bulk insert)

          M java/engine/org/apache/derby/iapi/sql/execute/TargetResultSet.java

          Extra interface method, offendingRowLocation. Only implemented with meaningful semantics for NoPutResultSetImpl which calls it for its targetResultSet, an InsertResultSet.

          M java/engine/org/apache/derby/iapi/sql/execute/ResultSetFactory.java
          M java/engine/org/apache/derby/impl/sql/execute/GenericResultSetFactory.java

          More parameters to getProjectRestrictResult set to do the magic mention in the overview for that result set, pass along schema and table name to InsertResultSet so we can remember them for check violations. They are used to produced checking SQL statements. This may be a bit fragile, since a rename schema or table could make those invalid. However, there is presently no RENAME SCHEMA in Derby and the RENAME TABLE is illegal in certain cases, notably if there is a check constraint defined on it, so the solution should be OK for now. Also adds an interface method, getValidateCheckConstraintResultSet, to allow the execution run-time to build one of those, cf. code generation logic in NestedLoopStrategy#resultSetMethodName.

          M java/engine/org/apache/derby/iapi/sql/execute/RowChanger.java
          M java/engine/org/apache/derby/impl/sql/execute/RowChangerImpl.java

          Extra parameter to insertRow to get at the row location if needed.

          M java/engine/org/apache/derby/iapi/store/access/BackingStoreHashtable.java
          M java/engine/org/apache/derby/iapi/store/access/ScanController.java

          Javadoc fixes.

          M java/engine/org/apache/derby/iapi/types/BooleanDataValue.java
          M java/engine/org/apache/derby/iapi/types/SQLBoolean.java

          Extra method throwExceptionIfImmediateAndFalse used by deferred check constraints to make a note of all violated constraints as evaluated by the generated method. Picked up by InsertResultSet or UpdateResultSet.

          A java/engine/org/apache/derby/impl/sql/compile/AndNoShortCircuitNode.java
          M java/engine/org/apache/derby/impl/sql/compile/AndNode.java
          M java/engine/org/apache/derby/impl/sql/compile/DMLModStatementNode.java

          AndNoShortCircuitNode is used to represent a non-McCarthy evaluation of the combined check constraints. See usage in DMLModStatementNode#generateCheckTree.

          M java/engine/org/apache/derby/impl/sql/compile/DeleteNode.java

          Extra dummy parameter added for call to super#bindConstraints (DMLModStatementNode). Only used by insert.

          M java/engine/org/apache/derby/impl/sql/compile/FromBaseTable.java

          Pick up the DERBY_PROPERTIES value for property "validateCheckConstraint = <conlomerateId>" we provide to the checking query (internal syntax only) generated by DeferredConstraintsMemory#validateCheck. The conglomerate id is used to retrieve the violating rows information set up by ProjectRestrictResultSet#openCore to drive ValidateCheckConstraintResultSet.

          M java/engine/org/apache/derby/impl/sql/compile/InsertNode.java

          Boolean member variable to know if we have a deferrable check constraint; also pass only schema and table name to the result set. Passed on to the InsertConstantAction from which InsertResultSet can pick it up.

          M java/engine/org/apache/derby/impl/sql/compile/ProjectRestrictNode.java

          Logic to keep track of whether we are used by the special internal query to check violated check constraints. In this case we also do not push the check predicates down to store for simpler handling.

          M java/engine/org/apache/derby/impl/sql/compile/QueryTreeNode.java

          Code to parse a long value from "--DERBY-PROPERTIES" property.

          M java/engine/org/apache/derby/impl/sql/compile/SetConstraintsNode.java

          Extra code to comply with the sane mode parse tree printing conventions.

          M java/engine/org/apache/derby/impl/sql/compile/TestConstraintNode.java

          Handle different code generation for deferrable check contraints.

          M java/engine/org/apache/derby/impl/sql/compile/UpdateNode.java

          Pass on more info: schema and table name + small refactoring.

          M java/engine/org/apache/derby/impl/sql/compile/sqlgrammar.jj

          Handle the new internal query to validate violated check constraints. Cf. query in DeferredConstraintsMemory#validateCheck.

          M java/engine/org/apache/derby/impl/sql/execute/AlterConstraintConstantAction.java
          M java/engine/org/apache/derby/impl/sql/execute/CreateConstraintConstantAction.java

          Open up for check constraints.

          M java/engine/org/apache/derby/impl/sql/execute/AlterTableConstantAction.java
          M java/engine/org/apache/derby/impl/sql/execute/ConstraintConstantAction.java

          ATCA: Special handling of adding a deferred check constraint: need different code path to get the UUID of constraint soon enough to be able to note any constraint violations. CCA: note any violation and remember it. We'd like to remember that row locations of the offending rows here, but not done for now, so at checking time, we'll need a full table scan. This can be improved upon, see code comment.

          M java/engine/org/apache/derby/impl/sql/execute/GenericConstantActionFactory.java
          M java/engine/org/apache/derby/impl/sql/execute/UpdateConstantAction.java
          M java/engine/org/apache/derby/impl/sql/execute/InsertConstantAction.java

          Pass on more info to InsertConstantAction and UpdateConstantAction needed by the result sets.

          M java/engine/org/apache/derby/impl/sql/execute/InsertResultSet.java

          Drives the checking for check constraints, and picks up the result. If we have violations and deferred constraints, we remember that. Also some refactorings to avoid local variables shadowing globals.

          M java/engine/org/apache/derby/impl/sql/execute/UpdateResultSet.java

          Drives the checking for check constraints, and picks up the result. If we have violations and deferred constraints, we remember that.

          M java/engine/org/apache/derby/impl/sql/execute/NoRowsResultSetImpl.java

          Removed unused method.

          M java/engine/org/apache/derby/impl/sql/execute/ProjectRestrictResultSet.java

          Drive the special result set, ValidateCheckConstraintResultSet by positioning it correctly for each row retrieved, using the remembered row locations from violation time.

          M java/engine/org/apache/derby/impl/sql/execute/SetConstraintsConstantAction.java

          Added logic for check constraints. Also added a new check that the user don't specify the same constraint twice, cf new test case for it.

          M java/engine/org/apache/derby/impl/sql/execute/TableScanResultSet.java

          Make some members protected rather than private, to let the new result set ValidateCheckConstraintResultSet inherit from it.

          M java/engine/org/apache/derby/impl/sql/execute/TemporaryRowHolderResultSet.java
          M java/engine/org/apache/derby/impl/store/access/sort/MergeScanRowSource.java
          M java/engine/org/apache/derby/impl/store/access/sort/SortBufferRowSource.java
          M java/engine/org/apache/derby/impl/sql/execute/CardinalityCounter.java
          M java/engine/org/apache/derby/impl/sql/execute/DMLWriteResultSet.java

          Boiler plate to comply with interface (not used).

          M java/engine/org/apache/derby/impl/sql/execute/UniqueIndexSortObserver.java
          M java/engine/org/apache/derby/impl/sql/execute/UniqueWithDuplicateNullsIndexSortObserver.java
          M java/engine/org/apache/derby/impl/sql/execute/IndexChanger.java
          M java/engine/org/apache/derby/impl/sql/execute/CreateIndexConstantAction.java

          Refactoring only.

          A java/engine/org/apache/derby/impl/sql/execute/ValidateCheckConstraintResultSet.java

          The new result we use to check violating rows only based on row location

          M java/engine/org/apache/derby/iapi/sql/compile/JoinStrategy.java
          M java/engine/org/apache/derby/impl/sql/compile/HashJoinStrategy.java
          M java/engine/org/apache/derby/impl/sql/compile/NestedLoopJoinStrategy.java

          New boolean to signal that we want ValidateCheckConstraintResultSet

          M java/engine/org/apache/derby/jdbc/EmbedXAResource.java
          M java/engine/org/apache/derby/jdbc/XATransactionState.java

          Extra logic to handle check constraints (already had it for primary key and unique).

          M java/engine/org/apache/derby/iapi/error/ExceptionUtil.java

          Utility method to determine if an exception if a transaction deferred constraint violation. Needed by the XA code.

          M java/engine/org/apache/derby/loc/messages.xml
          M java/shared/org/apache/derby/shared/common/reference/SQLState.java

          New error messages

          M java/testing/org/apache/derbyTesting/functionTests/tests/lang/ConstraintCharacteristicsTest.java

          New test cases and extension of present ones to include check constraints

          M java/testing/org/apache/derbyTesting/functionTests/tests/upgradeTests/Changes10_11.java

          Extension of present test cases to include check constraints.

          Show
          Dag H. Wanvik added a comment - - edited Uploading a first patch derby-532-check-constraints-1 which implements deferred check constraints and supporting tests. Comments are welcome. The high level approach is as follows. When a violation occurs, we note the row location in the base table of the offending row. At commit time (or when switching a constraint to immediate), we revisit those rows using the row locations if they are still valid, and validate those rows again. This is achieved by positioning to the saved row locations in combination with a specially crafted result set: ValidateCheckConstraintResultSet (see ProjectRestrictResultSet#getNextRowCore ) which positions to the offending base row using ValidateCheckConstraintResultSet#positionScanAtRowLocation before letting ValidateCheckConstraintResultSet read the row. If the row locations are no longer valid, e.g. an intervening compress happened, we do a full table scan to verify the constraints instead. Adding a constraint in deferred constraint mode is currently sub-optimal, since we currently do a full table scan via an internally generated "SELECT .. WHERE NOT <constraints>" , and we don't have a way the get at the row locations of the offending rows in this case. I might add a specially tailored result set for that purpose later. Normally, when a row is inserted or updated, we execute a generated method which combines evaluation of all check constraints on the table relevant for the inserted or updated columns. This evaluation is performed using McCarthy boolean evaluation (short-circuits as soon as result is known). This isn't optimal for deferred constraints, as we'd need to assume all constraints were violated in such a case. The implementation replaces the short-circuited evaluation with a full evaluation, so we can remember exactly which constraints were violated, cf. AndNoShortCircuitNode and SQLBoolean#throwExceptionIfImmediateAndFalse . A violation in throwExceptionIfImmediateAndFalse when we have a deferred constraint is noted ( DMLWriteResultSet#rememberConstraint implemented by UpdateResultSet and InsertResultSet ) by adding the violation to a list for that row. After the insert/update is completed, the set of violations is remembered for posterity, cf. InsertResultSet#normalInsertCode and UpdateResultSet#collectAffectedRows by inspecting the lists ( #violatingCheckConstraints ). Note that we currently do not note which constraints were violated for each individual row , only per table in the transaction. This means that we visit potentially more rows over again when a single constraint is changed to immediate. This could be improved further by storing the set of violated constraints along with the row location. For bulk insert and deferred (see panel 1 below) insert row processing there is special code paths, cf. InsertResultSet#offendingRowLocation which is invoked via a callback from HeapController#load and another path in InsertResultSet#normalInsertCode respectively. For update, the code for deferred treatment is in in one of UpdateResultSet#collectAffectedRows and UpdateResultSet#updateDeferredRows depending on whether there are triggers. The existing test ConstraintCharacteristcsTest has been built out by adding check constraint to those fixture for which it is relevant, as well as adding new ones which are only relevant for check constraints. 1 This "deferred" refers to Derby special handling of rows in certain situation, for example when doing an insert which uses the same table as a source result set, we need to make sure we don't get confused and see the incrementally inserted rows "again" as we process the original result set, essentially we do a snapshot of the source result set, hence "deferred rows". All regressions passed. Detailed code comments: M java/engine/org/apache/derby/iapi/sql/conn/SQLSessionContext.java M java/engine/org/apache/derby/impl/sql/conn/SQLSessionContextImpl.java M java/engine/org/apache/derby/iapi/sql/conn/LanguageConnectionContext.java M java/engine/org/apache/derby/impl/sql/conn/GenericLanguageConnectionContext.java D java/engine/org/apache/derby/impl/sql/execute/DeferredDuplicates.java A java/engine/org/apache/derby/impl/sql/execute/DeferredConstraintsMemory.java Extended and refactored slightly existing mechanism for deferred primary key/unique constraints to also cater for check constraints. Since the hash key we used for the memory of primary key and unique constraints was the conglomerate id of the indexes, and those are guaranteed to be disjoint from the conglomerate ids of the base tables having deferred constraints, we can use the same hash table to find the "memory" in the form of the disk based hash table (BackingStoreHashtable), cf. LCC#getDeferredHashTables.-- M java/engine/org/apache/derby/iapi/sql/dictionary/ConstraintDescriptor.java- Code to drop any deferred constraints memory in the transaction when a constraint is dropped.- M java/engine/org/apache/derby/impl/store/access/heap/HeapController.java Call back added for bulk insert in the presence of deferrable check constraints. M java/engine/org/apache/derby/iapi/sql/execute/NoPutResultSet.java M java/engine/org/apache/derby/impl/sql/execute/NoPutResultSetImpl.java M java/engine/org/apache/derby/iapi/store/access/RowLocationRetRowSource.java Extra plumbing to be able to signal to HeapController that we need to do a callback with the inserted row location (for bulk insert) M java/engine/org/apache/derby/iapi/sql/execute/TargetResultSet.java Extra interface method, offendingRowLocation. Only implemented with meaningful semantics for NoPutResultSetImpl which calls it for its targetResultSet, an InsertResultSet. M java/engine/org/apache/derby/iapi/sql/execute/ResultSetFactory.java M java/engine/org/apache/derby/impl/sql/execute/GenericResultSetFactory.java More parameters to getProjectRestrictResult set to do the magic mention in the overview for that result set, pass along schema and table name to InsertResultSet so we can remember them for check violations. They are used to produced checking SQL statements. This may be a bit fragile, since a rename schema or table could make those invalid. However, there is presently no RENAME SCHEMA in Derby and the RENAME TABLE is illegal in certain cases, notably if there is a check constraint defined on it, so the solution should be OK for now. Also adds an interface method, getValidateCheckConstraintResultSet, to allow the execution run-time to build one of those, cf. code generation logic in NestedLoopStrategy#resultSetMethodName. M java/engine/org/apache/derby/iapi/sql/execute/RowChanger.java M java/engine/org/apache/derby/impl/sql/execute/RowChangerImpl.java Extra parameter to insertRow to get at the row location if needed. M java/engine/org/apache/derby/iapi/store/access/BackingStoreHashtable.java M java/engine/org/apache/derby/iapi/store/access/ScanController.java Javadoc fixes. M java/engine/org/apache/derby/iapi/types/BooleanDataValue.java M java/engine/org/apache/derby/iapi/types/SQLBoolean.java Extra method throwExceptionIfImmediateAndFalse used by deferred check constraints to make a note of all violated constraints as evaluated by the generated method. Picked up by InsertResultSet or UpdateResultSet. A java/engine/org/apache/derby/impl/sql/compile/AndNoShortCircuitNode.java M java/engine/org/apache/derby/impl/sql/compile/AndNode.java M java/engine/org/apache/derby/impl/sql/compile/DMLModStatementNode.java AndNoShortCircuitNode is used to represent a non-McCarthy evaluation of the combined check constraints. See usage in DMLModStatementNode#generateCheckTree. M java/engine/org/apache/derby/impl/sql/compile/DeleteNode.java Extra dummy parameter added for call to super#bindConstraints (DMLModStatementNode). Only used by insert. M java/engine/org/apache/derby/impl/sql/compile/FromBaseTable.java Pick up the DERBY_PROPERTIES value for property "validateCheckConstraint = <conlomerateId>" we provide to the checking query (internal syntax only) generated by DeferredConstraintsMemory#validateCheck. The conglomerate id is used to retrieve the violating rows information set up by ProjectRestrictResultSet#openCore to drive ValidateCheckConstraintResultSet. M java/engine/org/apache/derby/impl/sql/compile/InsertNode.java Boolean member variable to know if we have a deferrable check constraint; also pass only schema and table name to the result set. Passed on to the InsertConstantAction from which InsertResultSet can pick it up. M java/engine/org/apache/derby/impl/sql/compile/ProjectRestrictNode.java Logic to keep track of whether we are used by the special internal query to check violated check constraints. In this case we also do not push the check predicates down to store for simpler handling. M java/engine/org/apache/derby/impl/sql/compile/QueryTreeNode.java Code to parse a long value from "--DERBY-PROPERTIES" property. M java/engine/org/apache/derby/impl/sql/compile/SetConstraintsNode.java Extra code to comply with the sane mode parse tree printing conventions. M java/engine/org/apache/derby/impl/sql/compile/TestConstraintNode.java Handle different code generation for deferrable check contraints. M java/engine/org/apache/derby/impl/sql/compile/UpdateNode.java Pass on more info: schema and table name + small refactoring. M java/engine/org/apache/derby/impl/sql/compile/sqlgrammar.jj Handle the new internal query to validate violated check constraints. Cf. query in DeferredConstraintsMemory#validateCheck. M java/engine/org/apache/derby/impl/sql/execute/AlterConstraintConstantAction.java M java/engine/org/apache/derby/impl/sql/execute/CreateConstraintConstantAction.java Open up for check constraints. M java/engine/org/apache/derby/impl/sql/execute/AlterTableConstantAction.java M java/engine/org/apache/derby/impl/sql/execute/ConstraintConstantAction.java ATCA: Special handling of adding a deferred check constraint: need different code path to get the UUID of constraint soon enough to be able to note any constraint violations. CCA: note any violation and remember it. We'd like to remember that row locations of the offending rows here, but not done for now, so at checking time, we'll need a full table scan. This can be improved upon, see code comment. M java/engine/org/apache/derby/impl/sql/execute/GenericConstantActionFactory.java M java/engine/org/apache/derby/impl/sql/execute/UpdateConstantAction.java M java/engine/org/apache/derby/impl/sql/execute/InsertConstantAction.java Pass on more info to InsertConstantAction and UpdateConstantAction needed by the result sets. M java/engine/org/apache/derby/impl/sql/execute/InsertResultSet.java Drives the checking for check constraints, and picks up the result. If we have violations and deferred constraints, we remember that. Also some refactorings to avoid local variables shadowing globals. M java/engine/org/apache/derby/impl/sql/execute/UpdateResultSet.java Drives the checking for check constraints, and picks up the result. If we have violations and deferred constraints, we remember that. M java/engine/org/apache/derby/impl/sql/execute/NoRowsResultSetImpl.java Removed unused method. M java/engine/org/apache/derby/impl/sql/execute/ProjectRestrictResultSet.java Drive the special result set, ValidateCheckConstraintResultSet by positioning it correctly for each row retrieved, using the remembered row locations from violation time. M java/engine/org/apache/derby/impl/sql/execute/SetConstraintsConstantAction.java Added logic for check constraints. Also added a new check that the user don't specify the same constraint twice, cf new test case for it. M java/engine/org/apache/derby/impl/sql/execute/TableScanResultSet.java Make some members protected rather than private, to let the new result set ValidateCheckConstraintResultSet inherit from it. M java/engine/org/apache/derby/impl/sql/execute/TemporaryRowHolderResultSet.java M java/engine/org/apache/derby/impl/store/access/sort/MergeScanRowSource.java M java/engine/org/apache/derby/impl/store/access/sort/SortBufferRowSource.java M java/engine/org/apache/derby/impl/sql/execute/CardinalityCounter.java M java/engine/org/apache/derby/impl/sql/execute/DMLWriteResultSet.java Boiler plate to comply with interface (not used). M java/engine/org/apache/derby/impl/sql/execute/UniqueIndexSortObserver.java M java/engine/org/apache/derby/impl/sql/execute/UniqueWithDuplicateNullsIndexSortObserver.java M java/engine/org/apache/derby/impl/sql/execute/IndexChanger.java M java/engine/org/apache/derby/impl/sql/execute/CreateIndexConstantAction.java Refactoring only. A java/engine/org/apache/derby/impl/sql/execute/ValidateCheckConstraintResultSet.java The new result we use to check violating rows only based on row location M java/engine/org/apache/derby/iapi/sql/compile/JoinStrategy.java M java/engine/org/apache/derby/impl/sql/compile/HashJoinStrategy.java M java/engine/org/apache/derby/impl/sql/compile/NestedLoopJoinStrategy.java New boolean to signal that we want ValidateCheckConstraintResultSet M java/engine/org/apache/derby/jdbc/EmbedXAResource.java M java/engine/org/apache/derby/jdbc/XATransactionState.java Extra logic to handle check constraints (already had it for primary key and unique). M java/engine/org/apache/derby/iapi/error/ExceptionUtil.java Utility method to determine if an exception if a transaction deferred constraint violation. Needed by the XA code. M java/engine/org/apache/derby/loc/messages.xml M java/shared/org/apache/derby/shared/common/reference/SQLState.java New error messages M java/testing/org/apache/derbyTesting/functionTests/tests/lang/ConstraintCharacteristicsTest.java New test cases and extension of present ones to include check constraints M java/testing/org/apache/derbyTesting/functionTests/tests/upgradeTests/Changes10_11.java Extension of present test cases to include check constraints.
          Hide
          Rick Hillegas added a comment -

          Thanks for the derby-532-check-constraints-1.diff patch, Dag. I took a quick glance at it. A couple points:

          1) It would be good to add header comments to AndNoShortCircuitNode and DeferredConstraintsMemory, explaining what these classes are for.

          2) The header comment for ValidateCheckConstraintResultSet suggests that it is only used when a debug flag is set. Is that true?

          Thanks,
          -Rick

          Show
          Rick Hillegas added a comment - Thanks for the derby-532-check-constraints-1.diff patch, Dag. I took a quick glance at it. A couple points: 1) It would be good to add header comments to AndNoShortCircuitNode and DeferredConstraintsMemory, explaining what these classes are for. 2) The header comment for ValidateCheckConstraintResultSet suggests that it is only used when a debug flag is set. Is that true? Thanks, -Rick
          Hide
          Dag H. Wanvik added a comment - - edited

          Hi Rick,
          thanks for looking at the patch.

          Uploading version 2, derby-532-check-constraints-2. This adds the comments you requested as well as some minor refactoring to simplify code in lcc a bit.
          As for ValidateCheckConstraintResultSet, it is used in normal operation (not only for debugging). What was the wording that made you think so? It is activated by a directive since it is only for internal use.

          Show
          Dag H. Wanvik added a comment - - edited Hi Rick, thanks for looking at the patch. Uploading version 2, derby-532-check-constraints-2 . This adds the comments you requested as well as some minor refactoring to simplify code in lcc a bit. As for ValidateCheckConstraintResultSet, it is used in normal operation (not only for debugging). What was the wording that made you think so? It is activated by a directive since it is only for internal use.
          Hide
          Rick Hillegas added a comment -

          Thanks, Dag. Here's the header comment:

          +/**
          + * Special result set used when checking deferred CHECK constraints.  Activated
          + * by a special {@code --DERBY_PROPERTY validateCheckConstraint=<conglomId>}
          + * override on a SELECT query, cf DeferredConstraintsMemory#validateCheck.  It
          + * relies on having a correct row location set prior to invoking {@code
          + * getNewtRowCore}, cf. the special code path in
          + * {@code ProjectRestrictResultSet#getNextRowCore} activated by
          + * {@code #validatingCheckConstraint}.
          + *
          + */
          

          To me that comment says that a ValidateCheckConstraintResultSet is only created when you set --DERBY_PROPERTY validateCheckConstraint=<conglomId>. Thanks.

          Show
          Rick Hillegas added a comment - Thanks, Dag. Here's the header comment: +/** + * Special result set used when checking deferred CHECK constraints. Activated + * by a special {@code --DERBY_PROPERTY validateCheckConstraint=<conglomId>} + * override on a SELECT query, cf DeferredConstraintsMemory#validateCheck. It + * relies on having a correct row location set prior to invoking {@code + * getNewtRowCore}, cf. the special code path in + * {@code ProjectRestrictResultSet#getNextRowCore} activated by + * {@code #validatingCheckConstraint}. + * + */ To me that comment says that a ValidateCheckConstraintResultSet is only created when you set --DERBY_PROPERTY validateCheckConstraint=<conglomId>. Thanks.
          Hide
          Dag H. Wanvik added a comment - - edited

          That's correct. That directive is what I use to make Derby use the non-standard scan when we check the violations when the constraint mode changes back to immediate. Cf. this code in DeferredConstraintsMemory#validateCheck, ca. line 476:

          // If a compress has happened in this transaction, we can't
          // trust the rowLocations, so make a full table scan. If
          // not, we optimize by using a special result set type
          // which utilized the saved away row locations for the
          // offending rows, so we only visit those when checking.
          // I.e. other rows are known to be good a priori.
          if (!ci.isInvalidated()) {
              checkStmt.append(
                  " --DERBY-PROPERTIES joinStrategy=nestedLoop, " +
                      "                    index=null, " +
                      "                    validateCheckConstraint=");
              checkStmt.append(Long.toString(baseTableCID));
              checkStmt.append('\n');
          }
          
          checkStmt.append(" WHERE NOT(");
          checkStmt.append(cd.getConstraintText());
          checkStmt.append(')');
          

          This is normal code path, though, not only for debugging.

          Show
          Dag H. Wanvik added a comment - - edited That's correct. That directive is what I use to make Derby use the non-standard scan when we check the violations when the constraint mode changes back to immediate. Cf. this code in DeferredConstraintsMemory#validateCheck, ca. line 476: // If a compress has happened in this transaction, we can't // trust the rowLocations, so make a full table scan. If // not, we optimize by using a special result set type // which utilized the saved away row locations for the // offending rows, so we only visit those when checking. // I.e. other rows are known to be good a priori. if (!ci.isInvalidated()) { checkStmt.append( " --DERBY-PROPERTIES joinStrategy=nestedLoop, " + " index= null , " + " validateCheckConstraint=" ); checkStmt.append( Long .toString(baseTableCID)); checkStmt.append('\n'); } checkStmt.append( " WHERE NOT(" ); checkStmt.append(cd.getConstraintText()); checkStmt.append(')'); This is normal code path, though, not only for debugging.
          Hide
          Rick Hillegas added a comment -

          Thanks for that explanation, Dag. Now I understand.

          Show
          Rick Hillegas added a comment - Thanks for that explanation, Dag. Now I understand.
          Hide
          ASF subversion and git services added a comment -

          Commit 1576367 from Dag H. Wanvik in branch 'code/trunk'
          [ https://svn.apache.org/r1576367 ]

          DERBY-532 Support deferrable constraints

          Patch derby-532-check-constraints-2 which implements deferred CHECK
          constraints and supporting tests.

          The high level approach is as follows. When a violation occurs, we note the row
          location in the base table of the offending row. At commit time (or when
          switching a constraint to immediate), we revisit those rows using the row
          locations if they are still valid, and validate those rows again. This is
          achieved by positioning to the saved row locations in combination with a
          specially crafted result set: ValidateCheckConstraintResultSet (see
          ProjectRestrictResultSet#getNextRowCore) which positions to the offending base
          row using ValidateCheckConstraintResultSet#positionScanAtRowLocation before
          letting ValidateCheckConstraintResultSet read the row. If the row locations are
          no longer valid, e.g. an intervening compress happened, we do a full table scan
          to verify the constraints instead.

          Adding a constraint in deferred constraint mode is currently sub-optimal, since
          we currently do a full table scan via an internally generated "SELECT .. WHERE
          NOT <constraints>", and we don't have a way the get at the row locations of the
          offending rows in this case. I might add a specially tailored result set for
          that purpose later.

          Normally, when a row is inserted or updated, we execute a generated method which
          combines evaluation of all check constraints on the table relevant for the
          inserted or updated columns. This evaluation is performed using McCarthy boolean
          evaluation (short-circuits as soon as result is known). This isn't optimal for
          deferred constraints, as we'd need to assume all constraints were violated in
          such a case. The implementation replaces the short-circuited evaluation with a
          full evaluation, so we can remember exactly which constraints were violated,
          cf. AndNoShortCircuitNode and SQLBoolean#throwExceptionIfImmediateAndFalse. A
          violation in throwExceptionIfImmediateAndFalse when we have a deferred
          constraint is noted (DMLWriteResultSet#rememberConstraint implemented by
          UpdateResultSet and InsertResultSet) by adding the violation to a list for that
          row. After the insert/update is completed, the set of violations is remembered
          for posterity, cf. InsertResultSet#normalInsertCode and
          UpdateResultSet#collectAffectedRows by inspecting the lists
          (#violatingCheckConstraints).

          Note that we currently do not note which constraints were violated *for each
          individual row*, only per table in the transaction. This means that we visit
          potentially more rows over again when a single constraint is changed to
          immediate. This could be improved further by storing the set of violated
          constraints along with the row location.

          For bulk insert and deferred (see panel 1 below) insert row processing there is
          special code paths, cf. InsertResultSet#offendingRowLocation which is invoked
          via a callback from HeapController#load and another path in
          InsertResultSet#normalInsertCode respectively.

          For update, the code for deferred treatment is in in one of
          UpdateResultSet#collectAffectedRows and UpdateResultSet#updateDeferredRows
          depending on whether there are triggers.

          The existing test ConstraintCharacteristcsTest has been built out by adding
          check constraint to those fixture for which it is relevant, as well as adding
          new ones which are only relevant for check constraints.

          [1] This "deferred" refers to Derby special handling of rows in certain
          situation, for example when doing an insert which uses the same table as a
          source result set, we need to make sure we don't get confused and see the
          incrementally inserted rows "again" as we process the original result set,
          essentially we do a snapshot of the source result set, hence "deferred rows".

          All regressions passed.

          Detailed code comments:

          M java/engine/org/apache/derby/iapi/sql/conn/SQLSessionContext.java
          M java/engine/org/apache/derby/impl/sql/conn/SQLSessionContextImpl.java
          M java/engine/org/apache/derby/iapi/sql/conn/LanguageConnectionContext.java
          M java/engine/org/apache/derby/impl/sql/conn/GenericLanguageConnectionContext.java
          D java/engine/org/apache/derby/impl/sql/execute/DeferredDuplicates.java
          A java/engine/org/apache/derby/impl/sql/execute/DeferredConstraintsMemory.java

          Extended and refactored slightly existing mechanism for deferred primary
          key/unique constraints to also cater for check constraints. Since the hash key
          we used for the memory of primary key and unique constraints was the
          conglomerate id of the indexes, and those are guaranteed to be disjoint from the
          conglomerate ids of the base tables having deferred constraints, we can use the
          same hash table to find the "memory" in the form of the disk based hash table
          (BackingStoreHashtable), cf. LCC#getDeferredHashTables.--

          M java/engine/org/apache/derby/iapi/sql/dictionary/ConstraintDescriptor.java-

          Code to drop any deferred constraints memory in the transaction when a
          constraint is dropped.-

          M java/engine/org/apache/derby/impl/store/access/heap/HeapController.java

          Call back added for bulk insert in the presence of deferrable check constraints.

          M java/engine/org/apache/derby/iapi/sql/execute/NoPutResultSet.java
          M java/engine/org/apache/derby/impl/sql/execute/NoPutResultSetImpl.java
          M java/engine/org/apache/derby/iapi/store/access/RowLocationRetRowSource.java

          Extra plumbing to be able to signal to HeapController that we need to do a
          callback with the inserted row location (for bulk insert)

          M java/engine/org/apache/derby/iapi/sql/execute/TargetResultSet.java

          Extra interface method, offendingRowLocation. Only implemented with meaningful
          semantics for NoPutResultSetImpl which calls it for its targetResultSet, an
          InsertResultSet.

          M java/engine/org/apache/derby/iapi/sql/execute/ResultSetFactory.java
          M java/engine/org/apache/derby/impl/sql/execute/GenericResultSetFactory.java

          More parameters to getProjectRestrictResult set to do the magic mention in the
          overview for that result set, pass along schema and table name to
          InsertResultSet so we can remember them for check violations. They are used to
          produced checking SQL statements. This may be a bit fragile, since a rename
          schema or table could make those invalid. However, there is presently no RENAME
          SCHEMA in Derby and the RENAME TABLE is illegal in certain cases, notably if
          there is a check constraint defined on it, so the solution should be OK for
          now. Also adds an interface method, getValidateCheckConstraintResultSet, to
          allow the execution run-time to build one of those, cf. code generation logic in
          NestedLoopStrategy#resultSetMethodName.

          M java/engine/org/apache/derby/iapi/sql/execute/RowChanger.java
          M java/engine/org/apache/derby/impl/sql/execute/RowChangerImpl.java

          Extra parameter to insertRow to get at the row location if needed.

          M java/engine/org/apache/derby/iapi/store/access/BackingStoreHashtable.java
          M java/engine/org/apache/derby/iapi/store/access/ScanController.java

          Javadoc fixes.

          M java/engine/org/apache/derby/iapi/types/BooleanDataValue.java
          M java/engine/org/apache/derby/iapi/types/SQLBoolean.java

          Extra method throwExceptionIfImmediateAndFalse used by deferred check
          constraints to make a note of all violated constraints as evaluated by the
          generated method. Picked up by InsertResultSet or UpdateResultSet.

          A java/engine/org/apache/derby/impl/sql/compile/AndNoShortCircuitNode.java
          M java/engine/org/apache/derby/impl/sql/compile/AndNode.java
          M java/engine/org/apache/derby/impl/sql/compile/DMLModStatementNode.java

          AndNoShortCircuitNode is used to represent a non-McCarthy evaluation of the
          combined check constraints. See usage in DMLModStatementNode#generateCheckTree.

          M java/engine/org/apache/derby/impl/sql/compile/DeleteNode.java

          Extra dummy parameter added for call to super#bindConstraints
          (DMLModStatementNode). Only used by insert.

          M java/engine/org/apache/derby/impl/sql/compile/FromBaseTable.java

          Pick up the DERBY_PROPERTIES value for property "validateCheckConstraint =
          <conlomerateId>" we provide to the checking query (internal syntax only)
          generated by DeferredConstraintsMemory#validateCheck. The conglomerate id is
          used to retrieve the violating rows information set up by
          ProjectRestrictResultSet#openCore to drive ValidateCheckConstraintResultSet.

          M java/engine/org/apache/derby/impl/sql/compile/InsertNode.java

          Boolean member variable to know if we have a deferrable check constraint; also
          pass only schema and table name to the result set. Passed on to the
          InsertConstantAction from which InsertResultSet can pick it up.

          M java/engine/org/apache/derby/impl/sql/compile/ProjectRestrictNode.java

          Logic to keep track of whether we are used by the special internal query to
          check violated check constraints. In this case we also do not push the check
          predicates down to store for simpler handling.

          M java/engine/org/apache/derby/impl/sql/compile/QueryTreeNode.java

          Code to parse a long value from "--DERBY-PROPERTIES" property.

          M java/engine/org/apache/derby/impl/sql/compile/SetConstraintsNode.java

          Extra code to comply with the sane mode parse tree printing conventions.

          M java/engine/org/apache/derby/impl/sql/compile/TestConstraintNode.java

          Handle different code generation for deferrable check contraints.

          M java/engine/org/apache/derby/impl/sql/compile/UpdateNode.java

          Pass on more info: schema and table name + small refactoring.

          M java/engine/org/apache/derby/impl/sql/compile/sqlgrammar.jj

          Handle the new internal query to validate violated check constraints. Cf. query
          in DeferredConstraintsMemory#validateCheck.

          M java/engine/org/apache/derby/impl/sql/execute/AlterConstraintConstantAction.java
          M java/engine/org/apache/derby/impl/sql/execute/CreateConstraintConstantAction.java

          Open up for check constraints.

          M java/engine/org/apache/derby/impl/sql/execute/AlterTableConstantAction.java
          M java/engine/org/apache/derby/impl/sql/execute/ConstraintConstantAction.java

          ATCA: Special handling of adding a deferred check constraint: need different
          code path to get the UUID of constraint soon enough to be able to note any
          constraint violations. CCA: note any violation and remember it. We'd like to
          remember that row locations of the offending rows here, but not done for now, so
          at checking time, we'll need a full table scan. This can be improved upon, see
          code comment.

          M java/engine/org/apache/derby/impl/sql/execute/GenericConstantActionFactory.java
          M java/engine/org/apache/derby/impl/sql/execute/UpdateConstantAction.java
          M java/engine/org/apache/derby/impl/sql/execute/InsertConstantAction.java

          Pass on more info to InsertConstantAction and UpdateConstantAction needed by the
          result sets.

          M java/engine/org/apache/derby/impl/sql/execute/InsertResultSet.java

          Drives the checking for check constraints, and picks up the result. If we have
          violations and deferred constraints, we remember that. Also some refactorings to
          avoid local variables shadowing globals.

          M java/engine/org/apache/derby/impl/sql/execute/UpdateResultSet.java

          Drives the checking for check constraints, and picks up the result. If we have
          violations and deferred constraints, we remember that.

          M java/engine/org/apache/derby/impl/sql/execute/NoRowsResultSetImpl.java

          Removed unused method.

          M java/engine/org/apache/derby/impl/sql/execute/ProjectRestrictResultSet.java

          Drive the special result set, ValidateCheckConstraintResultSet by positioning it
          correctly for each row retrieved, using the remembered row locations from
          violation time.

          M java/engine/org/apache/derby/impl/sql/execute/SetConstraintsConstantAction.java

          Added logic for check constraints. Also added a new check that the user don't
          specify the same constraint twice, cf new test case for it.

          M java/engine/org/apache/derby/impl/sql/execute/TableScanResultSet.java

          Make some members protected rather than private, to let the new result set
          ValidateCheckConstraintResultSet inherit from it.

          M java/engine/org/apache/derby/impl/sql/execute/TemporaryRowHolderResultSet.java
          M java/engine/org/apache/derby/impl/store/access/sort/MergeScanRowSource.java
          M java/engine/org/apache/derby/impl/store/access/sort/SortBufferRowSource.java
          M java/engine/org/apache/derby/impl/sql/execute/CardinalityCounter.java
          M java/engine/org/apache/derby/impl/sql/execute/DMLWriteResultSet.java

          Boiler plate to comply with interface (not used).

          M java/engine/org/apache/derby/impl/sql/execute/UniqueIndexSortObserver.java
          M java/engine/org/apache/derby/impl/sql/execute/UniqueWithDuplicateNullsIndexSortObserver.java
          M java/engine/org/apache/derby/impl/sql/execute/IndexChanger.java
          M java/engine/org/apache/derby/impl/sql/execute/CreateIndexConstantAction.java

          Refactoring only.

          A java/engine/org/apache/derby/impl/sql/execute/ValidateCheckConstraintResultSet.java

          The new result we use to check violating rows only based on row location

          M java/engine/org/apache/derby/iapi/sql/compile/JoinStrategy.java
          M java/engine/org/apache/derby/impl/sql/compile/HashJoinStrategy.java
          M java/engine/org/apache/derby/impl/sql/compile/NestedLoopJoinStrategy.java

          New boolean to signal that we want ValidateCheckConstraintResultSet

          M java/engine/org/apache/derby/jdbc/EmbedXAResource.java
          M java/engine/org/apache/derby/jdbc/XATransactionState.java

          Extra logic to handle check constraints (already had it for primary key and unique).

          M java/engine/org/apache/derby/iapi/error/ExceptionUtil.java

          Utility method to determine if an exception if a transaction deferred constraint
          violation. Needed by the XA code.

          M java/engine/org/apache/derby/loc/messages.xml
          M java/shared/org/apache/derby/shared/common/reference/SQLState.java

          New error messages

          M java/testing/org/apache/derbyTesting/functionTests/tests/lang/ConstraintCharacteristicsTest.java

          New test cases and extension of present ones to include check constraints

          M java/testing/org/apache/derbyTesting/functionTests/tests/upgradeTests/Changes10_11.java

          Extension of present test cases to include check constraints.

          Show
          ASF subversion and git services added a comment - Commit 1576367 from Dag H. Wanvik in branch 'code/trunk' [ https://svn.apache.org/r1576367 ] DERBY-532 Support deferrable constraints Patch derby-532-check-constraints-2 which implements deferred CHECK constraints and supporting tests. The high level approach is as follows. When a violation occurs, we note the row location in the base table of the offending row. At commit time (or when switching a constraint to immediate), we revisit those rows using the row locations if they are still valid, and validate those rows again. This is achieved by positioning to the saved row locations in combination with a specially crafted result set: ValidateCheckConstraintResultSet (see ProjectRestrictResultSet#getNextRowCore) which positions to the offending base row using ValidateCheckConstraintResultSet#positionScanAtRowLocation before letting ValidateCheckConstraintResultSet read the row. If the row locations are no longer valid, e.g. an intervening compress happened, we do a full table scan to verify the constraints instead. Adding a constraint in deferred constraint mode is currently sub-optimal, since we currently do a full table scan via an internally generated "SELECT .. WHERE NOT <constraints>", and we don't have a way the get at the row locations of the offending rows in this case. I might add a specially tailored result set for that purpose later. Normally, when a row is inserted or updated, we execute a generated method which combines evaluation of all check constraints on the table relevant for the inserted or updated columns. This evaluation is performed using McCarthy boolean evaluation (short-circuits as soon as result is known). This isn't optimal for deferred constraints, as we'd need to assume all constraints were violated in such a case. The implementation replaces the short-circuited evaluation with a full evaluation, so we can remember exactly which constraints were violated, cf. AndNoShortCircuitNode and SQLBoolean#throwExceptionIfImmediateAndFalse. A violation in throwExceptionIfImmediateAndFalse when we have a deferred constraint is noted (DMLWriteResultSet#rememberConstraint implemented by UpdateResultSet and InsertResultSet) by adding the violation to a list for that row. After the insert/update is completed, the set of violations is remembered for posterity, cf. InsertResultSet#normalInsertCode and UpdateResultSet#collectAffectedRows by inspecting the lists (#violatingCheckConstraints). Note that we currently do not note which constraints were violated *for each individual row*, only per table in the transaction. This means that we visit potentially more rows over again when a single constraint is changed to immediate. This could be improved further by storing the set of violated constraints along with the row location. For bulk insert and deferred (see panel 1 below) insert row processing there is special code paths, cf. InsertResultSet#offendingRowLocation which is invoked via a callback from HeapController#load and another path in InsertResultSet#normalInsertCode respectively. For update, the code for deferred treatment is in in one of UpdateResultSet#collectAffectedRows and UpdateResultSet#updateDeferredRows depending on whether there are triggers. The existing test ConstraintCharacteristcsTest has been built out by adding check constraint to those fixture for which it is relevant, as well as adding new ones which are only relevant for check constraints. [1] This "deferred" refers to Derby special handling of rows in certain situation, for example when doing an insert which uses the same table as a source result set, we need to make sure we don't get confused and see the incrementally inserted rows "again" as we process the original result set, essentially we do a snapshot of the source result set, hence "deferred rows". All regressions passed. Detailed code comments: M java/engine/org/apache/derby/iapi/sql/conn/SQLSessionContext.java M java/engine/org/apache/derby/impl/sql/conn/SQLSessionContextImpl.java M java/engine/org/apache/derby/iapi/sql/conn/LanguageConnectionContext.java M java/engine/org/apache/derby/impl/sql/conn/GenericLanguageConnectionContext.java D java/engine/org/apache/derby/impl/sql/execute/DeferredDuplicates.java A java/engine/org/apache/derby/impl/sql/execute/DeferredConstraintsMemory.java Extended and refactored slightly existing mechanism for deferred primary key/unique constraints to also cater for check constraints. Since the hash key we used for the memory of primary key and unique constraints was the conglomerate id of the indexes, and those are guaranteed to be disjoint from the conglomerate ids of the base tables having deferred constraints, we can use the same hash table to find the "memory" in the form of the disk based hash table (BackingStoreHashtable), cf. LCC#getDeferredHashTables.-- M java/engine/org/apache/derby/iapi/sql/dictionary/ConstraintDescriptor.java- Code to drop any deferred constraints memory in the transaction when a constraint is dropped.- M java/engine/org/apache/derby/impl/store/access/heap/HeapController.java Call back added for bulk insert in the presence of deferrable check constraints. M java/engine/org/apache/derby/iapi/sql/execute/NoPutResultSet.java M java/engine/org/apache/derby/impl/sql/execute/NoPutResultSetImpl.java M java/engine/org/apache/derby/iapi/store/access/RowLocationRetRowSource.java Extra plumbing to be able to signal to HeapController that we need to do a callback with the inserted row location (for bulk insert) M java/engine/org/apache/derby/iapi/sql/execute/TargetResultSet.java Extra interface method, offendingRowLocation. Only implemented with meaningful semantics for NoPutResultSetImpl which calls it for its targetResultSet, an InsertResultSet. M java/engine/org/apache/derby/iapi/sql/execute/ResultSetFactory.java M java/engine/org/apache/derby/impl/sql/execute/GenericResultSetFactory.java More parameters to getProjectRestrictResult set to do the magic mention in the overview for that result set, pass along schema and table name to InsertResultSet so we can remember them for check violations. They are used to produced checking SQL statements. This may be a bit fragile, since a rename schema or table could make those invalid. However, there is presently no RENAME SCHEMA in Derby and the RENAME TABLE is illegal in certain cases, notably if there is a check constraint defined on it, so the solution should be OK for now. Also adds an interface method, getValidateCheckConstraintResultSet, to allow the execution run-time to build one of those, cf. code generation logic in NestedLoopStrategy#resultSetMethodName. M java/engine/org/apache/derby/iapi/sql/execute/RowChanger.java M java/engine/org/apache/derby/impl/sql/execute/RowChangerImpl.java Extra parameter to insertRow to get at the row location if needed. M java/engine/org/apache/derby/iapi/store/access/BackingStoreHashtable.java M java/engine/org/apache/derby/iapi/store/access/ScanController.java Javadoc fixes. M java/engine/org/apache/derby/iapi/types/BooleanDataValue.java M java/engine/org/apache/derby/iapi/types/SQLBoolean.java Extra method throwExceptionIfImmediateAndFalse used by deferred check constraints to make a note of all violated constraints as evaluated by the generated method. Picked up by InsertResultSet or UpdateResultSet. A java/engine/org/apache/derby/impl/sql/compile/AndNoShortCircuitNode.java M java/engine/org/apache/derby/impl/sql/compile/AndNode.java M java/engine/org/apache/derby/impl/sql/compile/DMLModStatementNode.java AndNoShortCircuitNode is used to represent a non-McCarthy evaluation of the combined check constraints. See usage in DMLModStatementNode#generateCheckTree. M java/engine/org/apache/derby/impl/sql/compile/DeleteNode.java Extra dummy parameter added for call to super#bindConstraints (DMLModStatementNode). Only used by insert. M java/engine/org/apache/derby/impl/sql/compile/FromBaseTable.java Pick up the DERBY_PROPERTIES value for property "validateCheckConstraint = <conlomerateId>" we provide to the checking query (internal syntax only) generated by DeferredConstraintsMemory#validateCheck. The conglomerate id is used to retrieve the violating rows information set up by ProjectRestrictResultSet#openCore to drive ValidateCheckConstraintResultSet. M java/engine/org/apache/derby/impl/sql/compile/InsertNode.java Boolean member variable to know if we have a deferrable check constraint; also pass only schema and table name to the result set. Passed on to the InsertConstantAction from which InsertResultSet can pick it up. M java/engine/org/apache/derby/impl/sql/compile/ProjectRestrictNode.java Logic to keep track of whether we are used by the special internal query to check violated check constraints. In this case we also do not push the check predicates down to store for simpler handling. M java/engine/org/apache/derby/impl/sql/compile/QueryTreeNode.java Code to parse a long value from "--DERBY-PROPERTIES" property. M java/engine/org/apache/derby/impl/sql/compile/SetConstraintsNode.java Extra code to comply with the sane mode parse tree printing conventions. M java/engine/org/apache/derby/impl/sql/compile/TestConstraintNode.java Handle different code generation for deferrable check contraints. M java/engine/org/apache/derby/impl/sql/compile/UpdateNode.java Pass on more info: schema and table name + small refactoring. M java/engine/org/apache/derby/impl/sql/compile/sqlgrammar.jj Handle the new internal query to validate violated check constraints. Cf. query in DeferredConstraintsMemory#validateCheck. M java/engine/org/apache/derby/impl/sql/execute/AlterConstraintConstantAction.java M java/engine/org/apache/derby/impl/sql/execute/CreateConstraintConstantAction.java Open up for check constraints. M java/engine/org/apache/derby/impl/sql/execute/AlterTableConstantAction.java M java/engine/org/apache/derby/impl/sql/execute/ConstraintConstantAction.java ATCA: Special handling of adding a deferred check constraint: need different code path to get the UUID of constraint soon enough to be able to note any constraint violations. CCA: note any violation and remember it. We'd like to remember that row locations of the offending rows here, but not done for now, so at checking time, we'll need a full table scan. This can be improved upon, see code comment. M java/engine/org/apache/derby/impl/sql/execute/GenericConstantActionFactory.java M java/engine/org/apache/derby/impl/sql/execute/UpdateConstantAction.java M java/engine/org/apache/derby/impl/sql/execute/InsertConstantAction.java Pass on more info to InsertConstantAction and UpdateConstantAction needed by the result sets. M java/engine/org/apache/derby/impl/sql/execute/InsertResultSet.java Drives the checking for check constraints, and picks up the result. If we have violations and deferred constraints, we remember that. Also some refactorings to avoid local variables shadowing globals. M java/engine/org/apache/derby/impl/sql/execute/UpdateResultSet.java Drives the checking for check constraints, and picks up the result. If we have violations and deferred constraints, we remember that. M java/engine/org/apache/derby/impl/sql/execute/NoRowsResultSetImpl.java Removed unused method. M java/engine/org/apache/derby/impl/sql/execute/ProjectRestrictResultSet.java Drive the special result set, ValidateCheckConstraintResultSet by positioning it correctly for each row retrieved, using the remembered row locations from violation time. M java/engine/org/apache/derby/impl/sql/execute/SetConstraintsConstantAction.java Added logic for check constraints. Also added a new check that the user don't specify the same constraint twice, cf new test case for it. M java/engine/org/apache/derby/impl/sql/execute/TableScanResultSet.java Make some members protected rather than private, to let the new result set ValidateCheckConstraintResultSet inherit from it. M java/engine/org/apache/derby/impl/sql/execute/TemporaryRowHolderResultSet.java M java/engine/org/apache/derby/impl/store/access/sort/MergeScanRowSource.java M java/engine/org/apache/derby/impl/store/access/sort/SortBufferRowSource.java M java/engine/org/apache/derby/impl/sql/execute/CardinalityCounter.java M java/engine/org/apache/derby/impl/sql/execute/DMLWriteResultSet.java Boiler plate to comply with interface (not used). M java/engine/org/apache/derby/impl/sql/execute/UniqueIndexSortObserver.java M java/engine/org/apache/derby/impl/sql/execute/UniqueWithDuplicateNullsIndexSortObserver.java M java/engine/org/apache/derby/impl/sql/execute/IndexChanger.java M java/engine/org/apache/derby/impl/sql/execute/CreateIndexConstantAction.java Refactoring only. A java/engine/org/apache/derby/impl/sql/execute/ValidateCheckConstraintResultSet.java The new result we use to check violating rows only based on row location M java/engine/org/apache/derby/iapi/sql/compile/JoinStrategy.java M java/engine/org/apache/derby/impl/sql/compile/HashJoinStrategy.java M java/engine/org/apache/derby/impl/sql/compile/NestedLoopJoinStrategy.java New boolean to signal that we want ValidateCheckConstraintResultSet M java/engine/org/apache/derby/jdbc/EmbedXAResource.java M java/engine/org/apache/derby/jdbc/XATransactionState.java Extra logic to handle check constraints (already had it for primary key and unique). M java/engine/org/apache/derby/iapi/error/ExceptionUtil.java Utility method to determine if an exception if a transaction deferred constraint violation. Needed by the XA code. M java/engine/org/apache/derby/loc/messages.xml M java/shared/org/apache/derby/shared/common/reference/SQLState.java New error messages M java/testing/org/apache/derbyTesting/functionTests/tests/lang/ConstraintCharacteristicsTest.java New test cases and extension of present ones to include check constraints M java/testing/org/apache/derbyTesting/functionTests/tests/upgradeTests/Changes10_11.java Extension of present test cases to include check constraints.
          Hide
          ASF subversion and git services added a comment -

          Commit 1577134 from Dag H. Wanvik in branch 'code/trunk'
          [ https://svn.apache.org/r1577134 ]

          DERBY-532 Support deferrable constraints

          Added an extra test case.

          Show
          ASF subversion and git services added a comment - Commit 1577134 from Dag H. Wanvik in branch 'code/trunk' [ https://svn.apache.org/r1577134 ] DERBY-532 Support deferrable constraints Added an extra test case.
          Hide
          Dag H. Wanvik added a comment -

          Marking this as resolved.

          Show
          Dag H. Wanvik added a comment - Marking this as resolved.
          Hide
          Dag H. Wanvik added a comment -

          Oops, wrong issue, reopening.

          Show
          Dag H. Wanvik added a comment - Oops, wrong issue, reopening.
          Hide
          ASF subversion and git services added a comment -

          Commit 1579737 from Dag H. Wanvik in branch 'code/branches/10.10'
          [ https://svn.apache.org/r1579737 ]

          DERBY-6374 Bulk insert of data with nullable UNIQUE constraint fails to detect duplicates

          Backported to 10.10, had to do major surgery since the original fix
          was part of a DERBY-532 patch.

          Show
          ASF subversion and git services added a comment - Commit 1579737 from Dag H. Wanvik in branch 'code/branches/10.10' [ https://svn.apache.org/r1579737 ] DERBY-6374 Bulk insert of data with nullable UNIQUE constraint fails to detect duplicates Backported to 10.10, had to do major surgery since the original fix was part of a DERBY-532 patch.
          Hide
          Dag H. Wanvik added a comment - - edited

          Attaching an exploratory patch https://issues.apache.org/jira/secure/attachment/12637050/derby-532-fk-baseline.diff for foreign constraints: the new test ForeignKeysDeferrableTest (I may integrate in the existing test for deferrable constraints later), contains fixtures that enumerate the different places/code paths in which we check foreign constraints. The solution for
          deferrable constraints need to handle all of these.

          Show
          Dag H. Wanvik added a comment - - edited Attaching an exploratory patch https://issues.apache.org/jira/secure/attachment/12637050/derby-532-fk-baseline.diff for foreign constraints: the new test ForeignKeysDeferrableTest (I may integrate in the existing test for deferrable constraints later), contains fixtures that enumerate the different places/code paths in which we check foreign constraints. The solution for deferrable constraints need to handle all of these.
          Hide
          Dag H. Wanvik added a comment - - edited

          Uploading version 2 of the foreign constraints baseline test patch (improvements).
          https://issues.apache.org/jira/secure/attachment/12637180/derby-532-fk-baseline-2.diff

          Show
          Dag H. Wanvik added a comment - - edited Uploading version 2 of the foreign constraints baseline test patch (improvements). https://issues.apache.org/jira/secure/attachment/12637180/derby-532-fk-baseline-2.diff
          Hide
          Dag H. Wanvik added a comment -

          Uploading a first working patch for foreign constraints (derby-532-fk-1) with a new test, ForeignKeysDeferrableTest. The patch is preliminary; more testing needs to happen, but the basic functionality works for what's tested. It's not for commit yet, just for an early view/proof-of-concept.

          Show
          Dag H. Wanvik added a comment - Uploading a first working patch for foreign constraints (derby-532-fk-1) with a new test, ForeignKeysDeferrableTest. The patch is preliminary; more testing needs to happen, but the basic functionality works for what's tested. It's not for commit yet, just for an early view/proof-of-concept.
          Hide
          Dag H. Wanvik added a comment - - edited

          Uploading patch derby-532-fk-3, which relative to version 1 adds invalidation of saved rows for deferred foreign key constraints and adds tests for that. In addition this led to uncovering some issues with invalidation for unique constraints as well, also fixed here.
          I don't know of missing functionality for deferred foreign keys by now, so I'll look to committing this patch modulo review comments - reviews welcome at this point!
          Next, I plan to add more tests for more complicated use cases.

          JUnit regressions passed.

          Show
          Dag H. Wanvik added a comment - - edited Uploading patch derby-532-fk-3, which relative to version 1 adds invalidation of saved rows for deferred foreign key constraints and adds tests for that. In addition this led to uncovering some issues with invalidation for unique constraints as well, also fixed here. I don't know of missing functionality for deferred foreign keys by now, so I'll look to committing this patch modulo review comments - reviews welcome at this point! Next, I plan to add more tests for more complicated use cases. JUnit regressions passed.
          Hide
          Dag H. Wanvik added a comment -

          The approach take for deferring foreign keys is similar to that taken for the other constraints: when we detect a violation when inserting or updating the referring table, and when detecting a violation when deleting or updating the referenced table (only when we have ON DELETE (or UPDATE) NO ACTION), we save the key in a temporary table instead of throwing an exception. At check time, typically on commit, we revisit first the supporting index of referencing table to see if there might still be a problem. If that key is (still) present, we must also check the corresponding index in the referenced table. If that is found, all is good. Otherwise we throw. The main new pieces of the machinery are ForeignKeysDeferrableTest#rememberFKViolation and #validateForeignKey and GenericLanguageConnectionContext#doValidateForeignKey and its uses.

          Show
          Dag H. Wanvik added a comment - The approach take for deferring foreign keys is similar to that taken for the other constraints: when we detect a violation when inserting or updating the referring table, and when detecting a violation when deleting or updating the referenced table (only when we have ON DELETE (or UPDATE) NO ACTION), we save the key in a temporary table instead of throwing an exception. At check time, typically on commit, we revisit first the supporting index of referencing table to see if there might still be a problem. If that key is (still) present, we must also check the corresponding index in the referenced table. If that is found, all is good. Otherwise we throw. The main new pieces of the machinery are ForeignKeysDeferrableTest#rememberFKViolation and #validateForeignKey and GenericLanguageConnectionContext#doValidateForeignKey and its uses.
          Hide
          Dag H. Wanvik added a comment -

          Uploading a refactored version of last patch: derby-532-fk-4.
          It gives better encapsulation/object orientation relative to patch #3.

          Show
          Dag H. Wanvik added a comment - Uploading a refactored version of last patch: derby-532-fk-4. It gives better encapsulation/object orientation relative to patch #3.
          Hide
          Dag H. Wanvik added a comment -

          Uploading derby-532-fk-5, a revision of #4: smaller simplifications in asserting in ConstraintCharacteristicsTest, and clean-up of too long lines.
          Rerunning regressions.

          Show
          Dag H. Wanvik added a comment - Uploading derby-532-fk-5, a revision of #4: smaller simplifications in asserting in ConstraintCharacteristicsTest, and clean-up of too long lines. Rerunning regressions.

            People

            • Assignee:
              Dag H. Wanvik
              Reporter:
              Jörg von Frantzius
            • Votes:
              6 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:

                Development