Derby
  1. Derby
  2. DERBY-5367

Stale data retrieved when using new collation=TERRITORY_BASED:PRIMARY feature

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 10.8.1.2, 10.9.1.0
    • Fix Version/s: 10.8.2.2, 10.9.1.0
    • Component/s: Store
    • Labels:
      None
    • Environment:
      Mac OS X, Windows
    • Bug behavior facts:
      Data corruption

      Description

      Our product recently upgraded to version 10.8.1.2 in order to take advantage of the new 'case-insensitive' mode offered by Derby in the form of the "collation=TERRITORY_BASED:PRIMARY" connection parameter.

      Unfortunately, we have run into an issue whereby stale data appears to be retrieved from an index, even though the data in the table itself has changed.

      You can see this issue in the IJ session below. The database in question was created using this Java parameter when invoking IJ:

      -Dij.database=jdbc:derby:test;create=true;collation=TERRITORY_BASED:PRIMARY

      Here is the IJ session:

      CONNECTION0* - jdbc:derby:test

      • = current connection

      ij> CREATE TABLE tag (
      tag_id INTEGER GENERATED BY DEFAULT AS IDENTITY NOT NULL,
      tag VARCHAR(255) NOT NULL,
      CONSTRAINT tag_pk PRIMARY KEY (tag_id),
      CONSTRAINT tag_tag_unique UNIQUE (tag)
      );
      0 rows inserted/updated/deleted

      ij> – first insert a value 'Test', note the upper-case 'T' in 'Test'
      ij> INSERT INTO tag (tag) VALUES ('Test');
      1 row inserted/updated/deleted

      ij> SELECT * FROM tag;
      TAG_ID |TAG
      --------------------------------------------------------------------------------------------------------------------------------------------
      1 |Test

      1 row selected

      ij> – Now delete the row
      ij> DELETE FROM tag WHERE tag='Test';
      1 row inserted/updated/deleted

      ij> – You could run another SELECT here to verify it is gone, but it is.

      ij> – Now insert a new value 'test', note the lower-case 't' in 'test'
      ij> INSERT INTO tag (tag) VALUES ('test');
      1 row inserted/updated/deleted

      ij> – Now verify that the table contains only the lower-case version: 'test'
      ij> SELECT * FROM tag;
      TAG_ID |TAG
      --------------------------------------------------------------------------------------------------------------------------------------------
      2 |test

      1 row selected

      ij> – Now, here is the bug.
      ij> SELECT tag FROM tag;
      TAG
      --------------------------------------------------------------------------------------------------------------------------------
      Test

      1 row selected
      ij>

      Note in the last SELECT we specify the 'tag' column specifically. When we 'SELECT *', Derby performs a table-scan and the result is correct. However, when we 'SELECT tag', Derby appears to use the index created for the 'tag_tag_unique' unique constraint. As an optimization Derby, like many databases, will use values directly from the index in the case where the index covers all requested columns.

      The bigger question is, why doesn't the DELETE action cause the entry in the tag_tag_unique index to be deleted? Is this a further optimization? If so, it is imperative that the index at least be updated when the new value is inserted.

      This is rather a severe bug for us that causes stale data to be returned.

      1. derby-5367-4c-fix_with_optimization_improved.diff
        22 kB
        Kristian Waagan
      2. derby-5367-4b-fix_with_optimization_improved.diff
        21 kB
        Kristian Waagan
      3. derby-5367-4a-fix_with_optimization_improved.diff
        5 kB
        Kristian Waagan
      4. derby-5367-3a-update_field_by_field_preview.diff
        1 kB
        Kristian Waagan
      5. derby-5367-2a-minimal_fix.diff
        2 kB
        Kristian Waagan
      6. derby-5367-1b-update_row_fully.stat
        0.3 kB
        Kristian Waagan
      7. derby-5367-1b-update_row_fully.diff
        8 kB
        Kristian Waagan
      8. derby-5367-1a-update_row_fully.diff
        4 kB
        Kristian Waagan

        Activity

        Gavin made changes -
        Workflow jira [ 12625077 ] Default workflow, editable Closed status [ 12802740 ]
        Knut Anders Hatlen made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Hide
        Knut Anders Hatlen added a comment -

        [bulk update] Close all resolved issues that haven't been updated for more than one year.

        Show
        Knut Anders Hatlen added a comment - [bulk update] Close all resolved issues that haven't been updated for more than one year.
        Kristian Waagan made changes -
        Status In Progress [ 3 ] Resolved [ 5 ]
        Issue & fix info [Patch Available]
        Fix Version/s 10.8.2.2 [ 12317968 ]
        Resolution Fixed [ 1 ]
        Hide
        Kristian Waagan added a comment -

        Committed to 10.8 with revision 1175171 (the fix went in with revision 1174436 on trunk).

        I don't expect more work on this issue now, but there seems to be related work in this area of Derby.

        Show
        Kristian Waagan added a comment - Committed to 10.8 with revision 1175171 (the fix went in with revision 1174436 on trunk). I don't expect more work on this issue now, but there seems to be related work in this area of Derby.
        Hide
        Mike Matrigali added a comment -

        i reviewed the latest patch and it looks good to me. I agree that overhead should be basically "free" now with the code working off the
        stored conglomerate info and not having to recalculate it.

        It is great that your test case both tries the newly created conglomerate and shuts down the database and restarts testing the read conglomerate
        from disk case.

        Show
        Mike Matrigali added a comment - i reviewed the latest patch and it looks good to me. I agree that overhead should be basically "free" now with the code working off the stored conglomerate info and not having to recalculate it. It is great that your test case both tries the newly created conglomerate and shuts down the database and restarts testing the read conglomerate from disk case.
        Hide
        Kristian Waagan added a comment -

        Tests ran cleanly on 10.8 (with the fixed merged from trunk), with the exception of this error in testConcurrentInvalidation(org.apache.derbyTesting.functionTests.tests.lang.TruncateTableTest):
        Helper thread failed
        ...
        Fri Sep 23 01:19:36 CEST 2011 Thread[DRDAConnThread_77,5,derby.daemons] (XID = 24553), (SESSIONID = 31), (DATABASE = dbsqlauth), (DRDAID = ��������.��-808957766332849398

        {3}

        ), Failed Statement is: select * from d4275
        ERROR XSAI2: The conglomerate (4,448) requested does not exist.
        at org.apache.derby.iapi.error.StandardException.newException(StandardException.java:286)
        at org.apache.derby.impl.store.access.heap.HeapConglomerateFactory.readConglomerate(HeapConglomerateFactory.java:254)
        at org.apache.derby.impl.store.access.RAMAccessManager.conglomCacheFind(RAMAccessManager.java:482)
        at org.apache.derby.impl.store.access.RAMTransaction.findExistingConglomerate(RAMTransaction.java:394)
        at org.apache.derby.impl.store.access.RAMTransaction.getDynamicCompiledConglomInfo(RAMTransaction.java:692)
        at org.apache.derby.impl.sql.execute.TableScanResultSet.openCore(TableScanResultSet.java:245)
        at org.apache.derby.impl.sql.execute.BulkTableScanResultSet.openCore(BulkTableScanResultSet.java:248)
        at org.apache.derby.impl.sql.execute.BasicNoPutResultSetImpl.open(BasicNoPutResultSetImpl.java:255)
        at org.apache.derby.impl.sql.GenericPreparedStatement.executeStmt(GenericPreparedStatement.java:436)
        at org.apache.derby.impl.sql.GenericPreparedStatement.execute(GenericPreparedStatement.java:317)
        at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(EmbedStatement.java:1242)
        at org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeStatement(EmbedPreparedStatement.java:1686)
        at org.apache.derby.impl.jdbc.EmbedPreparedStatement.execute(EmbedPreparedStatement.java:1341)
        at org.apache.derby.impl.drda.DRDAStatement.execute(DRDAStatement.java:706)
        at org.apache.derby.impl.drda.DRDAConnThread.processCommands(DRDAConnThread.java:866)
        at org.apache.derby.impl.drda.DRDAConnThread.run(DRDAConnThread.java:295)
        Cleanup action completed

        I plan to backport the fix on Saturday.

        Show
        Kristian Waagan added a comment - Tests ran cleanly on 10.8 (with the fixed merged from trunk), with the exception of this error in testConcurrentInvalidation(org.apache.derbyTesting.functionTests.tests.lang.TruncateTableTest): Helper thread failed ... Fri Sep 23 01:19:36 CEST 2011 Thread [DRDAConnThread_77,5,derby.daemons] (XID = 24553), (SESSIONID = 31), (DATABASE = dbsqlauth), (DRDAID = ��������.��-808957766332849398 {3} ), Failed Statement is: select * from d4275 ERROR XSAI2: The conglomerate (4,448) requested does not exist. at org.apache.derby.iapi.error.StandardException.newException(StandardException.java:286) at org.apache.derby.impl.store.access.heap.HeapConglomerateFactory.readConglomerate(HeapConglomerateFactory.java:254) at org.apache.derby.impl.store.access.RAMAccessManager.conglomCacheFind(RAMAccessManager.java:482) at org.apache.derby.impl.store.access.RAMTransaction.findExistingConglomerate(RAMTransaction.java:394) at org.apache.derby.impl.store.access.RAMTransaction.getDynamicCompiledConglomInfo(RAMTransaction.java:692) at org.apache.derby.impl.sql.execute.TableScanResultSet.openCore(TableScanResultSet.java:245) at org.apache.derby.impl.sql.execute.BulkTableScanResultSet.openCore(BulkTableScanResultSet.java:248) at org.apache.derby.impl.sql.execute.BasicNoPutResultSetImpl.open(BasicNoPutResultSetImpl.java:255) at org.apache.derby.impl.sql.GenericPreparedStatement.executeStmt(GenericPreparedStatement.java:436) at org.apache.derby.impl.sql.GenericPreparedStatement.execute(GenericPreparedStatement.java:317) at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(EmbedStatement.java:1242) at org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeStatement(EmbedPreparedStatement.java:1686) at org.apache.derby.impl.jdbc.EmbedPreparedStatement.execute(EmbedPreparedStatement.java:1341) at org.apache.derby.impl.drda.DRDAStatement.execute(DRDAStatement.java:706) at org.apache.derby.impl.drda.DRDAConnThread.processCommands(DRDAConnThread.java:866) at org.apache.derby.impl.drda.DRDAConnThread.run(DRDAConnThread.java:295) Cleanup action completed I plan to backport the fix on Saturday.
        Kristian Waagan made changes -
        Fix Version/s 10.9.0.0 [ 12316344 ]
        Hide
        Kristian Waagan added a comment -

        Committed patch 4c to trunk with revision 1174436.

        Show
        Kristian Waagan added a comment - Committed patch 4c to trunk with revision 1174436.
        Kristian Waagan made changes -
        Hide
        Kristian Waagan added a comment -

        Attaching patch 4c. The only difference from 4b is an extra assert in OpenConglomerateScratchSpace.

        Show
        Kristian Waagan added a comment - Attaching patch 4c. The only difference from 4b is an extra assert in OpenConglomerateScratchSpace.
        Kristian Waagan made changes -
        Hide
        Kristian Waagan added a comment -

        Mike,

        My replies below, to your comments above.

        o I've changed this a bit again, see patch 4b.
        I now detect if there are columns with other collations than UCS BASIC
        when the conglomerate is created or opened. By tacking on to
        ConglomerateUtil.readCollationIdArray this comes almost for free (need
        one more boolean field in the conglom classes though), and we just have
        to iterate through the ids once upon creation.
        Derby always has to do an if on hasCollatedTypes(). This could maybe be
        made final with some refactoring (not sure if that would matter though).

        o Yes, the new test I wrote, based on the bug report, will be included.
        It is now part of patch 4b.
        There are no other tests stressing this piece of code in suites.All.

        o Please report this. It looked suspicious to me the first time I saw it,
        but I don't have the knowledge to add any more context.
        FYI, the test jdbciapi.SURQueryMixTest triggers that piece of code.
        (SUR = scrollable updatable resultset)

        o Noted. It would be good to log a JIRA for this if not already done.

        Now that it looks like there might be another RC, I'd like to get a fix for
        this issue included. Unfortunately, I'll be away next week. I'll commit this
        tomorrow if I don't get any pushback before then.
        If I commit and things go awry, back out the fix (it will be a single commit only).

        Show
        Kristian Waagan added a comment - Mike, My replies below, to your comments above. o I've changed this a bit again, see patch 4b. I now detect if there are columns with other collations than UCS BASIC when the conglomerate is created or opened. By tacking on to ConglomerateUtil.readCollationIdArray this comes almost for free (need one more boolean field in the conglom classes though), and we just have to iterate through the ids once upon creation. Derby always has to do an if on hasCollatedTypes(). This could maybe be made final with some refactoring (not sure if that would matter though). o Yes, the new test I wrote, based on the bug report, will be included. It is now part of patch 4b. There are no other tests stressing this piece of code in suites.All. o Please report this. It looked suspicious to me the first time I saw it, but I don't have the knowledge to add any more context. FYI, the test jdbciapi.SURQueryMixTest triggers that piece of code. (SUR = scrollable updatable resultset) o Noted. It would be good to log a JIRA for this if not already done. Now that it looks like there might be another RC, I'd like to get a fix for this issue included. Unfortunately, I'll be away next week. I'll commit this tomorrow if I don't get any pushback before then. If I commit and things go awry, back out the fix (it will be a single commit only).
        Hide
        Mike Matrigali added a comment -

        I have reviewed the most recent patch. Here are comments, none of which I would say should hold up a commit:

        o I much prefer the current optimization as now it is a compile time decision. I still would prefer if non-collated db's did not have to march through
        all the collumns and calculate if the row has collated columns on every compile. I might have pushed the work up to language by adding an
        argument to OpenConglomerateScratchSpace to pass in hasCollatedTypes. That would make most sense if the calling level has a better
        understanding of the collation state of the DB and/or table and could avoid the calculation.

        o I believe you said it, but just want to verify that you ran at least one set of tests where all tests went through the proposed collation code path just
        to make sure it was tested once. My guess is that the new codepath you are adding is probably not tested at all, without adding some new tests.
        Seems like you should at least add some version of the test case from the bug report.

        o I think the following is also a bug for same reason as you are working on - but not sure how to force the code path, but happy to have it as a separate JIRA - let me know if I should report it:
        if (this.getConglomerate().nKeyFields ==
        this.getConglomerate().nUniqueColumns)

        { // The row that we found deleted is exactly the new ro targetleaf.page.deleteAtSlot( insert_slot, false, this.btree_undo); break; }

        o longer term I think that we should not be doing the update at all, and instead doing purges. I think that will solve some of the possible out of space issues. But I think there are self deadlocking issues there so I am in favor of your approach especially for back porting to existing releases.

        Show
        Mike Matrigali added a comment - I have reviewed the most recent patch. Here are comments, none of which I would say should hold up a commit: o I much prefer the current optimization as now it is a compile time decision. I still would prefer if non-collated db's did not have to march through all the collumns and calculate if the row has collated columns on every compile. I might have pushed the work up to language by adding an argument to OpenConglomerateScratchSpace to pass in hasCollatedTypes. That would make most sense if the calling level has a better understanding of the collation state of the DB and/or table and could avoid the calculation. o I believe you said it, but just want to verify that you ran at least one set of tests where all tests went through the proposed collation code path just to make sure it was tested once. My guess is that the new codepath you are adding is probably not tested at all, without adding some new tests. Seems like you should at least add some version of the test case from the bug report. o I think the following is also a bug for same reason as you are working on - but not sure how to force the code path, but happy to have it as a separate JIRA - let me know if I should report it: if (this.getConglomerate().nKeyFields == this.getConglomerate().nUniqueColumns) { // The row that we found deleted is exactly the new ro targetleaf.page.deleteAtSlot( insert_slot, false, this.btree_undo); break; } o longer term I think that we should not be doing the update at all, and instead doing purges. I think that will solve some of the possible out of space issues. But I think there are self deadlocking issues there so I am in favor of your approach especially for back porting to existing releases.
        Hide
        Dag H. Wanvik added a comment -

        If I understand this correctly the optimization is the old code (already well tested) which would still apply for non-collated dbs. So, the new code would only apply for the (already buggy) collated code path. If so, it seems justified to me to have the optimization.

        Show
        Dag H. Wanvik added a comment - If I understand this correctly the optimization is the old code (already well tested) which would still apply for non-collated dbs. So, the new code would only apply for the (already buggy) collated code path. If so, it seems justified to me to have the optimization.
        Hide
        Brett Wooldridge added a comment -

        Kristian,

        With respect to this particular case (collation), the insert-delete-insert pattern is strictly based on user input. How this was discovered was that a user created a "tag" on an entity in our system (eg. 'cisco'), realized they had mistyped, deleted 'cisco', and added 'Cisco'. This resulted in the lower-case 'cisco' coming back (because of this bug). So to answer you question, with respect to this bug, I would classify it as "low frequency".

        However, obviously, insert-delete-insert patterns are fairly common in general. Any system that collects information in an automated fashion (news feeds, network data, seismic data, etc.) might delete a bunch of old rows and insert new rows with greater frequency.

        To tell you the truth, because of https://issues.apache.org/jira/browse/DERBY-4279, I've abandoned this "pattern" almost completely in Derby. Whenever I need to delete a bunch of rows ('bunch' being tens of) and insert a new batch in the same transaction, because of 4279, I now have to instead mark rows deleted (with an UPDATE statement) and then insert new rows. Deletion of marked rows is handled later via a scheduled deletion job.

        That of course is an aside (4279), but I'm surprised you can actually run your benchmark without running into it.

        Show
        Brett Wooldridge added a comment - Kristian, With respect to this particular case (collation), the insert-delete-insert pattern is strictly based on user input. How this was discovered was that a user created a "tag" on an entity in our system (eg. 'cisco'), realized they had mistyped, deleted 'cisco', and added 'Cisco'. This resulted in the lower-case 'cisco' coming back (because of this bug). So to answer you question, with respect to this bug, I would classify it as "low frequency". However, obviously, insert-delete-insert patterns are fairly common in general. Any system that collects information in an automated fashion (news feeds, network data, seismic data, etc.) might delete a bunch of old rows and insert new rows with greater frequency. To tell you the truth, because of https://issues.apache.org/jira/browse/DERBY-4279 , I've abandoned this "pattern" almost completely in Derby. Whenever I need to delete a bunch of rows ('bunch' being tens of) and insert a new batch in the same transaction, because of 4279, I now have to instead mark rows deleted (with an UPDATE statement) and then insert new rows. Deletion of marked rows is handled later via a scheduled deletion job. That of course is an aside (4279), but I'm surprised you can actually run your benchmark without running into it.
        Hide
        Kristian Waagan added a comment -

        Mike, what's your take on a fix along the lines of patch 4a?
        I'd have to run the same sets of benchmarks on non-collated database, first but if those results are looking okay I lean towards a commit.

        I don't have a good sense of how often this code will be run, so it's hard for me to say if we should just not optimize this at all. Judging by the performance numbers only, I would definitely do the optimization.

        I want to see a fix for this issue shortly, such that people affected can run off the head of 10.8 with their own build if they want to do so (with the risks associated of course).

        Show
        Kristian Waagan added a comment - Mike, what's your take on a fix along the lines of patch 4a? I'd have to run the same sets of benchmarks on non-collated database, first but if those results are looking okay I lean towards a commit. I don't have a good sense of how often this code will be run, so it's hard for me to say if we should just not optimize this at all. Judging by the performance numbers only, I would definitely do the optimization. I want to see a fix for this issue shortly, such that people affected can run off the head of 10.8 with their own build if they want to do so (with the risks associated of course).
        Hide
        Kristian Waagan added a comment -

        Brett, do you have an idea of how often the insert-delete-insert pattern is executed in your application(s)? (on tables with an index, with or without collation)

        Show
        Kristian Waagan added a comment - Brett, do you have an idea of how often the insert-delete-insert pattern is executed in your application(s)? (on tables with an index, with or without collation)
        Kristian Waagan made changes -
        Hide
        Kristian Waagan added a comment -

        Attaching patch 4a, which takes another approach when it comes to optimizing the code. The old/current code will be run on indexes which don't have collated types, whereas the for indexes with collated types all fields will be updated (one by one).

        Hopefully, one should see identical performance in the normal case with this patch.

        With collated types performance will drop as indicated in the performance results I posted above (worst case). It might be possible to trade CPU for IO, i.e. to compare the values before deciding to update the field. One can also avoid updating fields that aren't collated (only relevant for composite keys).

        Patch ready for review.

        Show
        Kristian Waagan added a comment - Attaching patch 4a, which takes another approach when it comes to optimizing the code. The old/current code will be run on indexes which don't have collated types, whereas the for indexes with collated types all fields will be updated (one by one). Hopefully, one should see identical performance in the normal case with this patch. With collated types performance will drop as indicated in the performance results I posted above (worst case). It might be possible to trade CPU for IO, i.e. to compare the values before deciding to update the field. One can also avoid updating fields that aren't collated (only relevant for composite keys). Patch ready for review.
        Hide
        Kristian Waagan added a comment - - edited

        I have run two simple benchmarks. The first one I cannot post the source code for, but it populates the database and then it starts deleting and inserting records. Here I saw around 7% performance decrease with the proposed fix.

        I'll attach the source for the other benchmark. It is very simple, and only inserts and deletes the same record over and over again. The insert and delete is done as one transaction.
        I ran with 60 s warmup and 300 s steady-state, ten runs per configuration. Unless stated otherwise, all settings are default. I just rounded the final numbers.
        Explanation for the "bad" runs follows further down.

        — single column key
        clean, auto on: 637 tps
        d5367, auto on: 557 tps
        ==> -12,5%

        clean, auto off: 3560 tps (5 "bad" runs)
        d5367, auto off: 2874 tps (3 "bad" runs)
        ==> -19.5%
        (commit interval 5k)

        — composite key, three columns
        clean, auto on: 551 tps
        d5367, auto on: 382 tps
        ==> -30%

        clean, auto off: 2799 tps (1 "bad" runs)
        d5367, auto off: 1671 tps (0 "bad" runs)
        ==> -40%

        — max checkpoint interval and log switch interval (single column key)
        clean, auto off: 6272 tps (3 "bad" runs)
        d5367, auto off: 4908 tps (0 "bad" runs)
        ==> -21,5%

        I tested the checkpoint interval and log switch settings because I saw runs with huge performance drops with the default settings. I wasn't able to get rid of these entirely. What I saw was runs with auto-commit off whose performance dropped roughly to the performance of the runs with auto-commit on. I haven't investigated further, but I'd like to know if the drops are limited duration events, or if the performance drop is permanent.
        The numbers above are for worst case scenarios.

        I'm currently testing an alternative patch to see how performance is affected when we optimize to avoid updating all fields when there are no collated types in the conglomerate. This has the drawback that concerned Mike - there will be multiple code paths and the new code executed for collated types only will be a lot less tested.
        This can be optimized further for composite keys, but that part is not high priority for me right now.

        Show
        Kristian Waagan added a comment - - edited I have run two simple benchmarks. The first one I cannot post the source code for, but it populates the database and then it starts deleting and inserting records. Here I saw around 7% performance decrease with the proposed fix. I'll attach the source for the other benchmark. It is very simple, and only inserts and deletes the same record over and over again. The insert and delete is done as one transaction. I ran with 60 s warmup and 300 s steady-state, ten runs per configuration. Unless stated otherwise, all settings are default. I just rounded the final numbers. Explanation for the "bad" runs follows further down. — single column key clean, auto on: 637 tps d5367, auto on: 557 tps ==> -12,5% clean, auto off: 3560 tps (5 "bad" runs) d5367, auto off: 2874 tps (3 "bad" runs) ==> -19.5% (commit interval 5k) — composite key, three columns clean, auto on: 551 tps d5367, auto on: 382 tps ==> -30% clean, auto off: 2799 tps (1 "bad" runs) d5367, auto off: 1671 tps (0 "bad" runs) ==> -40% — max checkpoint interval and log switch interval (single column key) clean, auto off: 6272 tps (3 "bad" runs) d5367, auto off: 4908 tps (0 "bad" runs) ==> -21,5% I tested the checkpoint interval and log switch settings because I saw runs with huge performance drops with the default settings. I wasn't able to get rid of these entirely. What I saw was runs with auto-commit off whose performance dropped roughly to the performance of the runs with auto-commit on. I haven't investigated further, but I'd like to know if the drops are limited duration events, or if the performance drop is permanent. The numbers above are for worst case scenarios. I'm currently testing an alternative patch to see how performance is affected when we optimize to avoid updating all fields when there are no collated types in the conglomerate. This has the drawback that concerned Mike - there will be multiple code paths and the new code executed for collated types only will be a lot less tested. This can be optimized further for composite keys, but that part is not high priority for me right now.
        Kristian Waagan made changes -
        Hide
        Kristian Waagan added a comment -

        Attaching patch 3a, which I'm running tests with.

        Can you specify in some more detail what you mean with a normal mix of insert/delete?

        I have started the first batch of tests for the simple case - insert and delete the same key over and over again.

        Show
        Kristian Waagan added a comment - Attaching patch 3a, which I'm running tests with. Can you specify in some more detail what you mean with a normal mix of insert/delete? I have started the first batch of tests for the simple case - insert and delete the same key over and over again.
        Hide
        Mike Matrigali added a comment -

        For benchmarks it seems like we would like to know results for 2 different types of benchmarks. The worst case for this code would be primary key insert and delete repeated using the same key. I think that is what you are doing. Then also just a normal mix of insert/delete. The question I would be trying to understand is it worth the overhead to the "normal" path to know if we can then optimize for the worst case path. Basically 3.6% degredation for that worst case which I don't think is a normal usage pattern may be ok. What I don't want to see is
        slowdown is "normal" case which I think we might have seen with patch 1 that did work in normal path to optimize the worst case path.

        I think option 2 is a reasonable approach for a bug fix to existing codelines. Seems safe to me, and a good first step. And doing it not
        special cased gets us full testing that caught the previous problem with the patch.

        With more thought I don't think we should do option 1. I think it might make sense to log a future project for the trunk to look into just going ahead a purging the deleted row and then retrying the insert. Whie we have the latch we may just want to purge all committed deleted rows
        in a similar fashion to what we do previous to splitting the page.
        The problem is that purge is tricky in that it has to be committed while holding the latch on the page, you can't just call it in-line in the current
        code. And you have to make sure that if you fork a transaction that parent is not conflicting with child to do the purge, otherwise might loop
        forever. The purge will also address the RESOLVE in the code right now (since that comment was written HeapRowLocations are now variable in size based on CompressedNumber) . And with the specific issue that is causing this issue it is also
        possible that updating "equal" fields might need more space given that different characters and chacter sequences can be "equal". So the
        problem is even possibly worse for the collation problem.

        Show
        Mike Matrigali added a comment - For benchmarks it seems like we would like to know results for 2 different types of benchmarks. The worst case for this code would be primary key insert and delete repeated using the same key. I think that is what you are doing. Then also just a normal mix of insert/delete. The question I would be trying to understand is it worth the overhead to the "normal" path to know if we can then optimize for the worst case path. Basically 3.6% degredation for that worst case which I don't think is a normal usage pattern may be ok. What I don't want to see is slowdown is "normal" case which I think we might have seen with patch 1 that did work in normal path to optimize the worst case path. I think option 2 is a reasonable approach for a bug fix to existing codelines. Seems safe to me, and a good first step. And doing it not special cased gets us full testing that caught the previous problem with the patch. With more thought I don't think we should do option 1. I think it might make sense to log a future project for the trunk to look into just going ahead a purging the deleted row and then retrying the insert. Whie we have the latch we may just want to purge all committed deleted rows in a similar fashion to what we do previous to splitting the page. The problem is that purge is tricky in that it has to be committed while holding the latch on the page, you can't just call it in-line in the current code. And you have to make sure that if you fork a transaction that parent is not conflicting with child to do the purge, otherwise might loop forever. The purge will also address the RESOLVE in the code right now (since that comment was written HeapRowLocations are now variable in size based on CompressedNumber) . And with the specific issue that is causing this issue it is also possible that updating "equal" fields might need more space given that different characters and chacter sequences can be "equal". So the problem is even possibly worse for the collation problem.
        Hide
        Kristian Waagan added a comment -

        derbyall passed too.
        Initial perf numbers seems to be comparable with the previous ones when using the in-memory back end.
        I'll start some longer runs tomorrow with the on-disk back end and upload the test code.

        Show
        Kristian Waagan added a comment - derbyall passed too. Initial perf numbers seems to be comparable with the previous ones when using the in-memory back end. I'll start some longer runs tomorrow with the on-disk back end and upload the test code.
        Hide
        Kristian Waagan added a comment -

        Thanks for that information, Mike!

        I have written a patch for option 2 already, and suites.All ran without failure (sane build). I'll run derbyall too, and run a first simple performance experiment. I suppose the performance degradation will become worse with the number of key fields.

        Another option would be to combine 2 with an improved version of what I did in my first patch - only update the full row if the data types in the key are collation sensitive.

        I have no experience in the area of option 1, so I will probably be unable to get something working before Monday...

        FYI, I saw the -1 with a sane build. The record isn't found and the store (I don't remember the method name) returns -1 as the slot.

        Show
        Kristian Waagan added a comment - Thanks for that information, Mike! I have written a patch for option 2 already, and suites.All ran without failure (sane build). I'll run derbyall too, and run a first simple performance experiment. I suppose the performance degradation will become worse with the number of key fields. Another option would be to combine 2 with an improved version of what I did in my first patch - only update the full row if the data types in the key are collation sensitive. I have no experience in the area of option 1, so I will probably be unable to get something working before Monday... FYI, I saw the -1 with a sane build. The record isn't found and the store (I don't remember the method name) returns -1 as the slot.
        Hide
        Mike Matrigali added a comment -

        I think the problem is that updateAtSlot interface can not be used in btree's in general. It only supports physical undo which means that the row must be in exactly the same spot on the page as it was originally. It is used in a couple of places in the btree code on the control row (ie. the row that is always in slot 0). But otherwise rows move around in btrees and that is the purpose of the logical undo logic. Updates of rows in btrees are always done as deletes followed by inserts. But that does not help in this special case. The updateFieldAtSlot interface does support logical undo.

        Off hand seems like we could:
        1) write a updateAtSlot that does logical undo. Before this it was not needed because an update of btree row would never "know" that it was going back to the same slot, so it was always cleaner to just do the delete and insert. I would be willing to help with this, let me know.

        2) a little ugly but might be interesting as a quick fix to see if it works is that you could do updateFieldAtSlot for each field in the row.

        I will think about this over night and see if I can come up with anything simpler.

        Show
        Mike Matrigali added a comment - I think the problem is that updateAtSlot interface can not be used in btree's in general. It only supports physical undo which means that the row must be in exactly the same spot on the page as it was originally. It is used in a couple of places in the btree code on the control row (ie. the row that is always in slot 0). But otherwise rows move around in btrees and that is the purpose of the logical undo logic. Updates of rows in btrees are always done as deletes followed by inserts. But that does not help in this special case. The updateFieldAtSlot interface does support logical undo. Off hand seems like we could: 1) write a updateAtSlot that does logical undo. Before this it was not needed because an update of btree row would never "know" that it was going back to the same slot, so it was always cleaner to just do the delete and insert. I would be willing to help with this, let me know. 2) a little ugly but might be interesting as a quick fix to see if it works is that you could do updateFieldAtSlot for each field in the row. I will think about this over night and see if I can come up with anything simpler.
        Hide
        Mike Matrigali added a comment -

        on the benchmark, i assume that was your original patch. 3.6% seems high to me. I understand that using in memory this is likely the worst case.

        Show
        Mike Matrigali added a comment - on the benchmark, i assume that was your original patch. 3.6% seems high to me. I understand that using in memory this is likely the worst case.
        Hide
        Mike Matrigali added a comment -

        not sure what is going on. Did you run the tests in SANE mode at all? That -1 sometimes is indicative of some sort of page corruption with something overwriting something else. Sometimes the SANE code will give a different more obvious error.

        Show
        Mike Matrigali added a comment - not sure what is going on. Did you run the tests in SANE mode at all? That -1 sometimes is indicative of some sort of page corruption with something overwriting something else. Sometimes the SANE code will give a different more obvious error.
        Hide
        Kristian Waagan added a comment -

        Note to self that the new test should specify the English locale (there may not be a collation for the locale specified/used by the user).

        Show
        Kristian Waagan added a comment - Note to self that the new test should specify the English locale (there may not be a collation for the locale specified/used by the user).
        Kristian Waagan made changes -
        Attachment derby-5367-2a-minimal_fix.diff [ 12492647 ]
        Hide
        Kristian Waagan added a comment -

        Attaching patch 2a, which produces the errors mentioned in the above comment.
        Not for commit.

        Show
        Kristian Waagan added a comment - Attaching patch 2a, which produces the errors mentioned in the above comment. Not for commit.
        Hide
        Kristian Waagan added a comment -

        Thanks, Mike.

        First, I wrote a very simple micro-benchmark. It ran an insert followed by a delete on a table with a primary key in a loop.
        With 30 seconds warm up, 60 seconds steady state and the average of three runs per codebase I ended up with a performance degradation of around 3.6% (insane build, using in-memory back end). I'll run this with the disk-based back end later, but first I have another issue to deal with - so these number may be invalid after all...

        With the simplest fix possible, which is to replace the updateFieldAtSlot with updateAtSlot, I'm getting a total of 33 errors in suites.All. I didn't see these when I tested the first patch, probably because the failing tests aren't using collations so the new code was never run in these tests. Here's what the core exception looks like, which can be reproduced by running lang.AlterTableTest (need only dropColumn + for instance testJira3175):
        java.lang.ArrayIndexOutOfBoundsException: -1
        at org.apache.derby.impl.store.raw.data.BasePage.getHeaderAtSlot(BasePage.java:1881)
        at org.apache.derby.impl.store.raw.data.StoredPage.storeRecordForUpdate(StoredPage.java:7316)
        at org.apache.derby.impl.store.raw.data.StoredPage.storeRecord(StoredPage.java:7185)
        at org.apache.derby.impl.store.raw.data.UpdateOperation.undoMe(UpdateOperation.java:201)
        at org.apache.derby.impl.store.raw.data.PhysicalUndoOperation.doMe(PhysicalUndoOperation.java:147)
        at org.apache.derby.impl.store.raw.log.FileLogger.logAndUndo(FileLogger.java:533)
        at org.apache.derby.impl.store.raw.xact.Xact.logAndUndo(Xact.java:374)
        at org.apache.derby.impl.store.raw.log.FileLogger.undo(FileLogger.java:1015)
        at org.apache.derby.impl.store.raw.xact.Xact.abort(Xact.java:952)
        at org.apache.derby.impl.store.access.RAMTransaction.abort(RAMTransaction.java:1985)
        at org.apache.derby.impl.sql.conn.GenericLanguageConnectionContext.doRollback(GenericLanguageConnectionContext.java:1767)
        at org.apache.derby.impl.sql.conn.GenericLanguageConnectionContext.userRollback(GenericLanguageConnectionContext.java:1675)
        at org.apache.derby.impl.jdbc.TransactionResourceImpl.rollback(TransactionResourceImpl.java:245)
        at org.apache.derby.impl.jdbc.EmbedConnection.rollback(EmbedConnection.java:1838)
        at org.apache.derbyTesting.junit.JDBC.cleanup(JDBC.java:224)
        at org.apache.derbyTesting.junit.BaseJDBCTestCase.tearDown(BaseJDBCTestCase.java:408)

        I do see that there are some differences related to undo-logic for the two method calls, but that may very well be a red herring. The problem does seem to be related to logging and transactions.
        So, there's obviously something I don't understand here... I'll continue digging tomorrow - any hints would be helpful

        Show
        Kristian Waagan added a comment - Thanks, Mike. First, I wrote a very simple micro-benchmark. It ran an insert followed by a delete on a table with a primary key in a loop. With 30 seconds warm up, 60 seconds steady state and the average of three runs per codebase I ended up with a performance degradation of around 3.6% (insane build, using in-memory back end). I'll run this with the disk-based back end later, but first I have another issue to deal with - so these number may be invalid after all... With the simplest fix possible, which is to replace the updateFieldAtSlot with updateAtSlot, I'm getting a total of 33 errors in suites.All. I didn't see these when I tested the first patch, probably because the failing tests aren't using collations so the new code was never run in these tests. Here's what the core exception looks like, which can be reproduced by running lang.AlterTableTest (need only dropColumn + for instance testJira3175): java.lang.ArrayIndexOutOfBoundsException: -1 at org.apache.derby.impl.store.raw.data.BasePage.getHeaderAtSlot(BasePage.java:1881) at org.apache.derby.impl.store.raw.data.StoredPage.storeRecordForUpdate(StoredPage.java:7316) at org.apache.derby.impl.store.raw.data.StoredPage.storeRecord(StoredPage.java:7185) at org.apache.derby.impl.store.raw.data.UpdateOperation.undoMe(UpdateOperation.java:201) at org.apache.derby.impl.store.raw.data.PhysicalUndoOperation.doMe(PhysicalUndoOperation.java:147) at org.apache.derby.impl.store.raw.log.FileLogger.logAndUndo(FileLogger.java:533) at org.apache.derby.impl.store.raw.xact.Xact.logAndUndo(Xact.java:374) at org.apache.derby.impl.store.raw.log.FileLogger.undo(FileLogger.java:1015) at org.apache.derby.impl.store.raw.xact.Xact.abort(Xact.java:952) at org.apache.derby.impl.store.access.RAMTransaction.abort(RAMTransaction.java:1985) at org.apache.derby.impl.sql.conn.GenericLanguageConnectionContext.doRollback(GenericLanguageConnectionContext.java:1767) at org.apache.derby.impl.sql.conn.GenericLanguageConnectionContext.userRollback(GenericLanguageConnectionContext.java:1675) at org.apache.derby.impl.jdbc.TransactionResourceImpl.rollback(TransactionResourceImpl.java:245) at org.apache.derby.impl.jdbc.EmbedConnection.rollback(EmbedConnection.java:1838) at org.apache.derbyTesting.junit.JDBC.cleanup(JDBC.java:224) at org.apache.derbyTesting.junit.BaseJDBCTestCase.tearDown(BaseJDBCTestCase.java:408) I do see that there are some differences related to undo-logic for the two method calls, but that may very well be a red herring. The problem does seem to be related to logging and transactions. So, there's obviously something I don't understand here... I'll continue digging tomorrow - any hints would be helpful
        Kristian Waagan made changes -
        Status Open [ 1 ] In Progress [ 3 ]
        Hide
        Mike Matrigali added a comment -

        a comment on the caching of the work to determine if any collation fields exist. The current patch optimizes this on a per open conglomerate basis.
        Absolute performance would be best if users did all their inserts in one open, but many do not. A better place to optimize this would be at compile
        time, this is the usual place to push these kinds of optimizations in Derby. I have not done work in this area for quite awhile, but the DynamicCompiledOpenConglomInfo of OpenBTree is the place to do such compile time optimizations. If changes are done in this area it is important to bump the version number to invalidate existing compiled stored plans.

        As I suggested I think it would be best to first check in a simple unoptimized fix, and then if necessary do the optimization based on measured
        perf diffs in the future.

        Show
        Mike Matrigali added a comment - a comment on the caching of the work to determine if any collation fields exist. The current patch optimizes this on a per open conglomerate basis. Absolute performance would be best if users did all their inserts in one open, but many do not. A better place to optimize this would be at compile time, this is the usual place to push these kinds of optimizations in Derby. I have not done work in this area for quite awhile, but the DynamicCompiledOpenConglomInfo of OpenBTree is the place to do such compile time optimizations. If changes are done in this area it is important to bump the version number to invalidate existing compiled stored plans. As I suggested I think it would be best to first check in a simple unoptimized fix, and then if necessary do the optimization based on measured perf diffs in the future.
        Hide
        Mike Matrigali added a comment -

        I looked at the patch, and I am leaning toward it is not worth putting in the special case code for collation. I think it would be better if you just got rid of the optimization of only updating the row location, and just do the whole row update in the case of matching row always. I think this path is not a very normal path and better to make the code
        simple and not make the fast insert path for all inserts deal with checking for collation or not. Extra benefit is we will get better testing with one code path
        and can optimize later if we need to.

        Do leave a comment why you are not doing the optimization, as it will seem strange without at least a reference to collation and why 2 things that compare the same might not actually be same value.

        I do agree that in general the fix is the right one, in that you need to update the whole row rather than part in the collation case.

        Either as part of this fix or log another jira I do believe the other block that handles all fields being the same also should be fixed. I posted a guess at those code paths, but to be sure probably easiest is to add a stack trace dump to standard out and run the full test suite. Again I think it would be reasonable to
        just remove the optimization always and do a full update with a comment.

        Show
        Mike Matrigali added a comment - I looked at the patch, and I am leaning toward it is not worth putting in the special case code for collation. I think it would be better if you just got rid of the optimization of only updating the row location, and just do the whole row update in the case of matching row always. I think this path is not a very normal path and better to make the code simple and not make the fast insert path for all inserts deal with checking for collation or not. Extra benefit is we will get better testing with one code path and can optimize later if we need to. Do leave a comment why you are not doing the optimization, as it will seem strange without at least a reference to collation and why 2 things that compare the same might not actually be same value. I do agree that in general the fix is the right one, in that you need to update the whole row rather than part in the collation case. Either as part of this fix or log another jira I do believe the other block that handles all fields being the same also should be fixed. I posted a guess at those code paths, but to be sure probably easiest is to add a stack trace dump to standard out and run the full test suite. Again I think it would be reasonable to just remove the optimization always and do a full update with a comment.
        Hide
        Mike Matrigali added a comment -

        I have not looked at patch yet. First some answers to previous questions.

        As to the following:
        a) Deoptimize insert - always insert a new index row.

        This would be a bad idea. Algorithms in the btree depend on every leaf row in the tree being unique, whether the rows are deleted or not. There are 2 cases
        to consider: unique and non-unique btrees. In the case of non-unique btrees each leaf row is made unique by including the trailing row location as part of the whole key. In the case of unique btrees the keys should be unique not including the row location, and thus the search algo's don't include it as part of the key.

        The kinds of things that depend on this may include the following:
        o the various binary searches on the leafs assume a single unique final destination, if the code allows for multiple duplicate leafs then I think you might randomly end up in the middle of a list of keys depending on binary search, and the code does not expect to get to a destination and "backup" to beginning of
        a list.
        o the check for inserting a duplicate error assumes the binary search will get to exactly the ONE row that matches depending on unique vs non-unique
        parameters. I think bugs will arise if there are more than one matching row.

        As to following question:
        >The block above the one where I applied the fix deals with the case where also the row location is the same.
        >When does this case happen, and is it guaranteed that the row values are correct even when a (non-default) collation is being used?
        Logically the code should handle this as Derby theoretically allows for reusable record ids. In practice heap pages are marked to not allow them, so
        i don;t think this is a normal event. It might happen as part of particular paths through crash recovery where we do logical undo vs. physical undo on
        btree pages.

        I am not sure about the question about non-default collation. At high level I believe it should work and the store should just count on the result
        of the collation code doing the compare. If it says 2 different things are the same then it should just treat them as the same.

        Next I will look at patch. Sorry about delay - have been out and away from list for awhile.

        Show
        Mike Matrigali added a comment - I have not looked at patch yet. First some answers to previous questions. As to the following: a) Deoptimize insert - always insert a new index row. This would be a bad idea. Algorithms in the btree depend on every leaf row in the tree being unique, whether the rows are deleted or not. There are 2 cases to consider: unique and non-unique btrees. In the case of non-unique btrees each leaf row is made unique by including the trailing row location as part of the whole key. In the case of unique btrees the keys should be unique not including the row location, and thus the search algo's don't include it as part of the key. The kinds of things that depend on this may include the following: o the various binary searches on the leafs assume a single unique final destination, if the code allows for multiple duplicate leafs then I think you might randomly end up in the middle of a list of keys depending on binary search, and the code does not expect to get to a destination and "backup" to beginning of a list. o the check for inserting a duplicate error assumes the binary search will get to exactly the ONE row that matches depending on unique vs non-unique parameters. I think bugs will arise if there are more than one matching row. As to following question: >The block above the one where I applied the fix deals with the case where also the row location is the same. >When does this case happen, and is it guaranteed that the row values are correct even when a (non-default) collation is being used? Logically the code should handle this as Derby theoretically allows for reusable record ids. In practice heap pages are marked to not allow them, so i don;t think this is a normal event. It might happen as part of particular paths through crash recovery where we do logical undo vs. physical undo on btree pages. I am not sure about the question about non-default collation. At high level I believe it should work and the store should just count on the result of the collation code doing the compare. If it says 2 different things are the same then it should just treat them as the same. Next I will look at patch. Sorry about delay - have been out and away from list for awhile.
        Kristian Waagan made changes -
        Assignee Kristian Waagan [ kristwaa ]
        Kristian Waagan made changes -
        Attachment derby-5367-1b-update_row_fully.diff [ 12490427 ]
        Attachment derby-5367-1b-update_row_fully.stat [ 12490428 ]
        Hide
        Kristian Waagan added a comment -

        Thanks for testing and reviewing the patch, Brett.
        I will work hard to have a fix committed in time for the upcoming 10.8.2-release, but I'd still like to get a comment from one of the store experts.
        The store is tricky, and maybe there are other places in the code that isn't prepared for collations.

        Attaching patch 1b, with the following changes:
        o moved the check for collated types inside the sp.resultExact block. A further change may be to remove the instance variables altogether.
        o added a regression test in CollationTest2.

        There might be room for improvement in the wording (both comments and method-names), since my vocabulary related to collations is rather thin Also, I believe we are applying a collation order even when there is no territory specified, so DVD.isCollatedType is maybe confusing?

        Another question:
        The block above the one where I applied the fix deals with the case where also the row location is the same.
        When does this case happen, and is it guaranteed that the row values are correct even when a (non-default) collation is being used?

        Show
        Kristian Waagan added a comment - Thanks for testing and reviewing the patch, Brett. I will work hard to have a fix committed in time for the upcoming 10.8.2-release, but I'd still like to get a comment from one of the store experts. The store is tricky, and maybe there are other places in the code that isn't prepared for collations. Attaching patch 1b, with the following changes: o moved the check for collated types inside the sp.resultExact block. A further change may be to remove the instance variables altogether. o added a regression test in CollationTest2. There might be room for improvement in the wording (both comments and method-names), since my vocabulary related to collations is rather thin Also, I believe we are applying a collation order even when there is no territory specified, so DVD.isCollatedType is maybe confusing? Another question: The block above the one where I applied the fix deals with the case where also the row location is the same. When does this case happen, and is it guaranteed that the row values are correct even when a (non-default) collation is being used?
        Hide
        Brett Wooldridge added a comment -

        Thanks Kristian.

        I have reviewed and tested the patch, and can't find any issues with it. It seems like the minimal patch to existing code that would fix the bug.

        Show
        Brett Wooldridge added a comment - Thanks Kristian. I have reviewed and tested the patch, and can't find any issues with it. It seems like the minimal patch to existing code that would fix the bug.
        Kristian Waagan made changes -
        Issue & fix info [Patch Available]
        Kristian Waagan made changes -
        Attachment derby-5367-1a-update_row_fully.diff [ 12490246 ]
        Hide
        Kristian Waagan added a comment -

        Attaching patch 1a, which replaces/updates the whole row if a deleted row is reused and collations are enabled.

        Patch ready for discussion and review.
        It passed the regression tests and the bug demonstrated by the repro is addressed. I don't really expect to commit this one, but it demonstrates one possible solution.

        Show
        Kristian Waagan added a comment - Attaching patch 1a, which replaces/updates the whole row if a deleted row is reused and collations are enabled. Patch ready for discussion and review. It passed the regression tests and the bug demonstrated by the repro is addressed. I don't really expect to commit this one, but it demonstrates one possible solution.
        Hide
        Dag H. Wanvik added a comment -

        I think we would want to avoid turning off the optimization if we don't have to.
        But is there a way to know what collations are "safe"? I think our default collation uses binary UTF comparisons, so those should be ok, cf. http://db.apache.org/derby/docs/10.8/devguide/cdevcollation.html . Perhaps you can determine if non-default collation is active and only deoptimize (or do an extra compare to determine if de-opt'ing needed) for those cases.

        Show
        Dag H. Wanvik added a comment - I think we would want to avoid turning off the optimization if we don't have to. But is there a way to know what collations are "safe"? I think our default collation uses binary UTF comparisons, so those should be ok, cf. http://db.apache.org/derby/docs/10.8/devguide/cdevcollation.html . Perhaps you can determine if non-default collation is active and only deoptimize (or do an extra compare to determine if de-opt'ing needed) for those cases.
        Hide
        Kristian Waagan added a comment - - edited

        Turns out the problem here is the insert into the index (see BTreeController.doIns).
        There is optimization code in there to undelete a deleted row in a unique index if the "same" row is inserted again before the row has been purged. This assumes that the fields that are part of the key are equal, but this assumption may be broken when using collations.

        The insert code is smart enough to update the row location of the index row, but it doesn't understand that the value stored in the base row has changed.

        I'm not very familiar with this code, so I'd like to get some feedback from more knowledgeable people.
        To me these possible solutions come to mind:
        a) Deoptimize insert - always insert a new index row.
        b) Make conglomerate code aware of collations and take appropriate actions after comparing the new and the old value(s)

        I don't know what consequences the proposed solutions bring to the table.
        A quick and dirty fix of the code (where I updated the key field) made the repro pass.

        Show
        Kristian Waagan added a comment - - edited Turns out the problem here is the insert into the index (see BTreeController.doIns). There is optimization code in there to undelete a deleted row in a unique index if the "same" row is inserted again before the row has been purged. This assumes that the fields that are part of the key are equal, but this assumption may be broken when using collations. The insert code is smart enough to update the row location of the index row, but it doesn't understand that the value stored in the base row has changed. I'm not very familiar with this code, so I'd like to get some feedback from more knowledgeable people. To me these possible solutions come to mind: a) Deoptimize insert - always insert a new index row. b) Make conglomerate code aware of collations and take appropriate actions after comparing the new and the old value(s) I don't know what consequences the proposed solutions bring to the table. A quick and dirty fix of the code (where I updated the key field) made the repro pass.
        Kristian Waagan made changes -
        Field Original Value New Value
        Affects Version/s 10.9.0.0 [ 12316344 ]
        Urgency Urgent
        Hide
        Kristian Waagan added a comment -

        Verified the repro on trunk (Solaris 11 Express).
        Reset the urgency flag, that's for the release manager to specify.

        Compressing the table makes the problem go away - this further strengthen the suspicion about a corrupted index.
        The bug is also seen when issuing a delete without a where-clause.

        Show
        Kristian Waagan added a comment - Verified the repro on trunk (Solaris 11 Express). Reset the urgency flag, that's for the release manager to specify. Compressing the table makes the problem go away - this further strengthen the suspicion about a corrupted index. The bug is also seen when issuing a delete without a where-clause.
        Brett Wooldridge created issue -

          People

          • Assignee:
            Kristian Waagan
            Reporter:
            Brett Wooldridge
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development