OpenJPA
  1. OpenJPA
  2. OPENJPA-1163

Data consistency issues while modifying collections.

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.1.1, 2.0.0-M3
    • Component/s: kernel
    • Labels:
      None
    • Environment:
      openJPA trunk. Derby DB.
    • Patch Info:
      Patch Available

      Description

      There are data consistency issues when modifying more number of elements in a collection Vs less number of elements.

      Following is a detailed explanation about the issue with example:

      • Entity A has a collection of Entities AItems with cascade ALL.
      • Test case :
        Clear all the data inside tables representing Entity A and AItems.
        Create 3 entity managers em1,em2 and em3.

      em1.begin()
      create A on em1 with id "1"
      add 10 elements of AItems (id's from 0-9) to the created A(id 1).
      persist A.
      em1.commit()

      em1.begin()
      merge A ( created in the previous step)
      Remove 3 elements of AItems from the merged A.
      Add 3 elements of AItems ( id's 10,11,12) to the merged A (id 1).

      With out committing em1

      em2.begin()
      query database to fetch A and construct object result2 of entity A.
      Add 3 elements of AItems ( id's 13,14,15) to fetched A ( result2)

      em2.commit ()
      em1.commit()

      em3.begin()
      query database to check the size of AItems that are related to A ( id 1)
      em3.commit()

      The result on em3's query for AItems related to A, returns 13 as expected.
      13 ( Initial 10 - em1's 3 + em1's 3 + em2's 3).

      When the same test case is repeated with removing and adding 10 elements instead of 3 as before then I get wrong results.

      Add initial 10 AItems (id's 0-9) for A.
      commit()

      em1 will remove 10 AItems from the collection of A.
      em1 will add 10 AItems (id's 10-19) to collection of A.

      em2 will add 10 AItems (id's 20-29) to collection of A.

      Commit em2.
      Commit em1.

      Then instead of 20 elements ( Initial 10 - em1's 10 + em1's 10 + em2's 10), I see only 10 elements.

      The 10 elements that I see are from em1's added AItems ( id's 10-19).

      I think the cause of the issue is that, when more number of elements (compared to initial element count of collection) in a collection are modified then collection tracking is disabled and openJPA tries to do the following:
      – Delete every thing from the collection
      – Insert data back to collection.
      While Inserting the data back it does not consider adding the dirty records ( em2's 10 added elements ) because the collection tracking is disabled.

      1. OPENJPA-1163_trunk.patch
        32 kB
        Ravi P Palacherla

        Issue Links

          Activity

          Hide
          Donald Woods added a comment -

          add missing Fix version for trunk

          Show
          Donald Woods added a comment - add missing Fix version for trunk
          Hide
          David Ezzio added a comment -

          From trunk, merged fix to 1.1.x branch at rev 802218

          Show
          David Ezzio added a comment - From trunk, merged fix to 1.1.x branch at rev 802218
          Hide
          Michael Dick added a comment -

          > So, the end result is that the data inserted into database is dependent on the number of modifications made on the collection.
          > I think it is bug as the data inserted into the database should be consistent irrespective of the number of modifications
          > made to the collection.

          I agree, the number of modifications should not have an affect.

          > 2) When the data modified in two concurrent transactions does not interfere with each other then both the transactions should win.
          > For example, let's consider table A which has 5 rows ( primary key of int 1-5) and has row level locking.
          > Transaction 1 tries to modify rows 1 -3 and transaction 2 modifies 4-5 at the same time.
          > In this case I think both the transactions has to win.

          IMO this really depends on who owns the relationship or the collection.

          For example this model :
          @Entity
          public class Manager

          { @OneToMany(mappedBy="manager") private Collection<Employee> employees; }

          @Entity
          public class Employee

          { @ManyToOne private Manager manager; }

          Employee owns the relationship, and the employee table has a foreign key. Any Employees with a FK -> a particular manager are considered that manager's Employees. In this case we should see the behavior Ravi is advocating.

          The following example is slightly different :
          @Entity
          public class Manager

          { @OneToMany private Collection<Employee> employees; }

          @Entity
          public class Employee

          { // no reference to Manager. }

          In this case Manager is the owner of the relationship and the Manager object owns the relationship and the set of Employees. In this case I'm less certain how we should behave. My original take was that the last to commit wins - Manager owns the state of the relationship and bears the sole burden of maintaining it.

          I'm not sure that's the ideal solution, but I didn't feel confident enough in the answer to change the default.

          Show
          Michael Dick added a comment - > So, the end result is that the data inserted into database is dependent on the number of modifications made on the collection. > I think it is bug as the data inserted into the database should be consistent irrespective of the number of modifications > made to the collection. I agree, the number of modifications should not have an affect. > 2) When the data modified in two concurrent transactions does not interfere with each other then both the transactions should win. > For example, let's consider table A which has 5 rows ( primary key of int 1-5) and has row level locking. > Transaction 1 tries to modify rows 1 -3 and transaction 2 modifies 4-5 at the same time. > In this case I think both the transactions has to win. IMO this really depends on who owns the relationship or the collection. For example this model : @Entity public class Manager { @OneToMany(mappedBy="manager") private Collection<Employee> employees; } @Entity public class Employee { @ManyToOne private Manager manager; } Employee owns the relationship, and the employee table has a foreign key. Any Employees with a FK -> a particular manager are considered that manager's Employees. In this case we should see the behavior Ravi is advocating. The following example is slightly different : @Entity public class Manager { @OneToMany private Collection<Employee> employees; } @Entity public class Employee { // no reference to Manager. } In this case Manager is the owner of the relationship and the Manager object owns the relationship and the set of Employees. In this case I'm less certain how we should behave. My original take was that the last to commit wins - Manager owns the state of the relationship and bears the sole burden of maintaining it. I'm not sure that's the ideal solution, but I didn't feel confident enough in the answer to change the default.
          Hide
          Ravi P Palacherla added a comment -

          >> If this issue is a defect then the default value should flip.

          The reason for not switching the default value is because I am not completely confident that the previous behavior is a defect.
          So without confirmation of whether it is a defect or not, I felt it is convenient for existing applications using the current default
          value to upgrade to latest versions without the need for any additional properties. Also there is a doc JIRA (OPENJPA-1223) opened for documenting this.

          In my opinion the previous value is a defect.
          May be I am wrong but I think it is a defect based on the following :

          1) The previous default behavior is as follows:
          When the number of modifications(add/remove) to a collection exceeds the initial size of collection then the logic is to
          remove everything from collection and re-insert the objects. In the process of re-inserting it will not consider data
          that is modified as part of another transaction.

          So, the end result is that the data inserted into database is dependent on the number of modifications made on the collection.
          I think it is bug as the data inserted into the database should be consistent irrespective of the number of modifications
          made to the collection.

          2) When the data modified in two concurrent transactions does not interfere with each other then both the transactions should win.
          For example, let's consider table A which has 5 rows ( primary key of int 1-5) and has row level locking.
          Transaction 1 tries to modify rows 1 -3 and transaction 2 modifies 4-5 at the same time.
          In this case I think both the transactions has to win.
          The above is not possible with default value of autoOff, when the # of modifications to A exceeds 5.

          If you think adding an option to ProxyManager is a better fit than compatibility configuration then I will modify my fix.
          Can I please ask, if there is any additional advantage with it other than patch footprint.

          Regards,
          Ravi.

          Show
          Ravi P Palacherla added a comment - >> If this issue is a defect then the default value should flip. The reason for not switching the default value is because I am not completely confident that the previous behavior is a defect. So without confirmation of whether it is a defect or not, I felt it is convenient for existing applications using the current default value to upgrade to latest versions without the need for any additional properties. Also there is a doc JIRA ( OPENJPA-1223 ) opened for documenting this. In my opinion the previous value is a defect. May be I am wrong but I think it is a defect based on the following : 1) The previous default behavior is as follows: When the number of modifications(add/remove) to a collection exceeds the initial size of collection then the logic is to remove everything from collection and re-insert the objects. In the process of re-inserting it will not consider data that is modified as part of another transaction. So, the end result is that the data inserted into database is dependent on the number of modifications made on the collection. I think it is bug as the data inserted into the database should be consistent irrespective of the number of modifications made to the collection. 2) When the data modified in two concurrent transactions does not interfere with each other then both the transactions should win. For example, let's consider table A which has 5 rows ( primary key of int 1-5) and has row level locking. Transaction 1 tries to modify rows 1 -3 and transaction 2 modifies 4-5 at the same time. In this case I think both the transactions has to win. The above is not possible with default value of autoOff, when the # of modifications to A exceeds 5. If you think adding an option to ProxyManager is a better fit than compatibility configuration then I will modify my fix. Can I please ask, if there is any additional advantage with it other than patch footprint. Regards, Ravi.
          Hide
          Pinaki Poddar added a comment -

          Hi Ravi/Mike,
          As you have analyzed the issue, do you think
          1. is this earlier value of autoOff considered a defect? Or a behavior that we must retain with a Compatibility option?

          If it is ony a defect that was not exposed till this issue then there may not be a strong case for a compatibility option.

          If it not a defect and we need to retain both the old value and the new value, then instead of adding a separate compatibility option, consider
          autooff as one more configurable option of ProxyManager. ProxyManager already have few boolean configurable properties so this new
          choice will fit neatly (also the patch footprint will reduce).
          If this issue is not a defect then the default value should remain as before. But documentation should explain when the configuration should be switched.
          If this issue is a defect then the default value should flip. But documentation should explain how the configuration could be switched to backward compatible behavior

          Show
          Pinaki Poddar added a comment - Hi Ravi/Mike, As you have analyzed the issue, do you think 1. is this earlier value of autoOff considered a defect? Or a behavior that we must retain with a Compatibility option? If it is ony a defect that was not exposed till this issue then there may not be a strong case for a compatibility option. If it not a defect and we need to retain both the old value and the new value, then instead of adding a separate compatibility option, consider autooff as one more configurable option of ProxyManager. ProxyManager already have few boolean configurable properties so this new choice will fit neatly (also the patch footprint will reduce). If this issue is not a defect then the default value should remain as before. But documentation should explain when the configuration should be switched. If this issue is a defect then the default value should flip. But documentation should explain how the configuration could be switched to backward compatible behavior
          Hide
          Michael Dick added a comment -

          Hi Ravi,

          Great testcases (sorry this was a long time coming). I wouldn't recommend using this approach for relationships but the PersistentCollection is an interesting wrinkle.

          Thanks very much for your hard work on this issue.

          Show
          Michael Dick added a comment - Hi Ravi, Great testcases (sorry this was a long time coming). I wouldn't recommend using this approach for relationships but the PersistentCollection is an interesting wrinkle. Thanks very much for your hard work on this issue.
          Hide
          Ravi P Palacherla added a comment -

          Forgot to attach the testcase.
          The current patch contains both the test case and fix.

          Regards,
          Ravi.

          Show
          Ravi P Palacherla added a comment - Forgot to attach the testcase. The current patch contains both the test case and fix. Regards, Ravi.
          Hide
          Ravi P Palacherla added a comment -

          The attached patch contains fix along with the testcase.
          This will take autoOff parameter from openjpa.compatibility option.
          The default option is what ever existing currently with out the patch.

          Regards,
          Ravi.

          Show
          Ravi P Palacherla added a comment - The attached patch contains fix along with the testcase. This will take autoOff parameter from openjpa.compatibility option. The default option is what ever existing currently with out the patch. Regards, Ravi.
          Hide
          Ravi P Palacherla added a comment -

          Hi,

          I simplified the test case now.
          It has only one entity "A".
          A in unversioned and it has an id ( primary key), name, age and a map.

          A's initialValues :
          A(id,age,name,map) = A (1, 30, "Initial", 0-9 ( size of 10).

          Trans1 :
          A.Age = 40;
          Remove 8 elements in Map.
          Add 8 elements to Map (10 - 17)
          Trans2:
          A.Name = "Changed"
          Add 8 elements to Map (20 - 27)

          Commit Trans2 .
          Commit Trans1.

          Results :

          A's values :
          A(id,age,name,map) = A (1, 40, "Changed Name", 8,9,10-17 ( size of 10).

          The same when repeated with adding & removing 3 elements ( instead of 8)
          then A's values :
          A(id,age,name,map) = A (1, 40, "Changed Name", 3-9,10-12,20-22 ( size of 13).

          The question is which behavior in the above is true ?

          Should the whole Map be replaced by Trans 1 ( as it is last committed) (or)
          Should the Map be a combination of changes made in Trans1 and Trans2 ?

          If you see the other values of A; the changes made to age and name by Trans1 and 2 are both considered.
          When it comes to map the results are different.

          Regards,
          Ravi.

          Show
          Ravi P Palacherla added a comment - Hi, I simplified the test case now. It has only one entity "A". A in unversioned and it has an id ( primary key), name, age and a map. A's initialValues : A(id,age,name,map) = A (1, 30, "Initial", 0-9 ( size of 10). Trans1 : A.Age = 40; Remove 8 elements in Map. Add 8 elements to Map (10 - 17) Trans2: A.Name = "Changed" Add 8 elements to Map (20 - 27) Commit Trans2 . Commit Trans1. Results : A's values : A(id,age,name,map) = A (1, 40, "Changed Name", 8,9,10-17 ( size of 10). The same when repeated with adding & removing 3 elements ( instead of 8) then A's values : A(id,age,name,map) = A (1, 40, "Changed Name", 3-9,10-12,20-22 ( size of 13). The question is which behavior in the above is true ? Should the whole Map be replaced by Trans 1 ( as it is last committed) (or) Should the Map be a combination of changes made in Trans1 and Trans2 ? If you see the other values of A; the changes made to age and name by Trans1 and 2 are both considered. When it comes to map the results are different. Regards, Ravi.
          Hide
          Ravi P Palacherla added a comment -

          Hi Mike,

          Thanks a lot for spending your time on this.

          >> It's very interesting that the behavior is different when you add more changes (exceed 10).

          CollectionChangeTrackerImpl.add() and remove() methods should give more info about this.
          If the number of added objects plus the number of removed objects are greater than collection size, ChangeTracker is forcefully disabled.
          If ChangeTracker is disabled, OpenJPA update strategy is delete all and re-insert all objects.
          So while insert-all it can not see the objects inserted as part of em2 ( in my sample).
          Hence this bug.

          >> What bothers me about it is that you've updated a versioned entity (newA and result2) twice but didn't get an Optimistic Lock Exception.

          Excellent concern.
          I was thinking previously that AItem objects modified in tran 1 and tran 2 does not interfere with each other and hence should not result in optimistic lock excpetion.
          Now I understand that both tran 1 and tran 2 are working on same row of A , which mean at the time of tran1 commit I should get optimistic lock exception.

          I verified the reason why I am not getting optimistic lock exception and I changed the method level version annotation to field level.
          Now I got optimistic lock errors when I set the version annotation on field rather than on method.
          I think version annotation on methods has some issue.

          With version annotation on method, when I check the db tables then I do not see the corresponding version column.
          After adding version attribute to the field I can see version column on the table and getting optimistic lock exception.

          So in my example, both A and AItems are treated as non versioned entities.

          In the process of answering your concerns, I realized that I definitely have to revisit my testcase.
          I will try to modify the test case such that it properly demonstrates the current issue.
          I think the issue defined in this JIRA is valid but my test case needs to be revisited.
          Do you agree with this ?

          Thanks again,
          Ravi.

          Show
          Ravi P Palacherla added a comment - Hi Mike, Thanks a lot for spending your time on this. >> It's very interesting that the behavior is different when you add more changes (exceed 10). CollectionChangeTrackerImpl.add() and remove() methods should give more info about this. If the number of added objects plus the number of removed objects are greater than collection size, ChangeTracker is forcefully disabled. If ChangeTracker is disabled, OpenJPA update strategy is delete all and re-insert all objects. So while insert-all it can not see the objects inserted as part of em2 ( in my sample). Hence this bug. >> What bothers me about it is that you've updated a versioned entity (newA and result2) twice but didn't get an Optimistic Lock Exception. Excellent concern. I was thinking previously that AItem objects modified in tran 1 and tran 2 does not interfere with each other and hence should not result in optimistic lock excpetion. Now I understand that both tran 1 and tran 2 are working on same row of A , which mean at the time of tran1 commit I should get optimistic lock exception. I verified the reason why I am not getting optimistic lock exception and I changed the method level version annotation to field level. Now I got optimistic lock errors when I set the version annotation on field rather than on method. I think version annotation on methods has some issue. With version annotation on method, when I check the db tables then I do not see the corresponding version column. After adding version attribute to the field I can see version column on the table and getting optimistic lock exception. So in my example, both A and AItems are treated as non versioned entities. In the process of answering your concerns, I realized that I definitely have to revisit my testcase. I will try to modify the test case such that it properly demonstrates the current issue. I think the issue defined in this JIRA is valid but my test case needs to be revisited. Do you agree with this ? Thanks again, Ravi.
          Hide
          Michael Dick added a comment -

          Hi Ravi,

          I'm sorry, I misread the test results I'd made some changes and may have changed the testcase.

          I believe that both times the expected result should be 10 AItems (pretend A is unversioned). The last commit contains an instance of A, which is related to 10 AItems. A has a OneToMany unidirectional relationship with AItem. Since AItem does not have a relationship with A, A is the owner of the relationship. So the last transaction to update A:id=1 wins (or at least so I thought).

          It's very interesting that the behavior is different when you add more changes (exceed 10).

          What bothers me about it is that you've updated a versioned entity (newA and result2) twice but didn't get an Optimistic Lock Exception. I'd think this is the correct behavior in this case (because A owns the relationship, it's really A that gets updated). That won't help for unversioned entities though - they'll still get the odd behavior you've found.

          Show
          Michael Dick added a comment - Hi Ravi, I'm sorry, I misread the test results I'd made some changes and may have changed the testcase. I believe that both times the expected result should be 10 AItems (pretend A is unversioned). The last commit contains an instance of A, which is related to 10 AItems. A has a OneToMany unidirectional relationship with AItem. Since AItem does not have a relationship with A, A is the owner of the relationship. So the last transaction to update A:id=1 wins (or at least so I thought). It's very interesting that the behavior is different when you add more changes (exceed 10). What bothers me about it is that you've updated a versioned entity (newA and result2) twice but didn't get an Optimistic Lock Exception. I'd think this is the correct behavior in this case (because A owns the relationship, it's really A that gets updated). That won't help for unversioned entities though - they'll still get the odd behavior you've found.
          Hide
          Ravi P Palacherla added a comment -

          Hi Mike,

          In my example, data dealt with Trans1 and Trans2 is not interfering with each other.
          Both Trans1 and Trans2 are working on different objects and different rows of table.
          Hence I fail to understand what you meant by "This patch demonstrates a race condition more than anything else."

          The whole idea of my testcase is to demonstrate that the results in database are different if I execute same code but the number of rows updates are different.

          A Item has 10 rows.
          Trans1 --> deletes 10 rows. (0 to 9)
          Adds 10 rows.(10 to 19)
          Trans2 --> Adds 10 rows.(20 to 29)
          Commit Trans2.
          Commit Trans1.

          Rows in talbe AItems after above execution : ( 10 to 19) = 10 rows in total.

          The same code when I manipulate only 3 rows.
          A Item has 10 rows.
          Trans1 --> deletes 3 rows. (0 to 6)
          Adds 3 rows.(10 to 12)
          Trans2 --> Adds 3 rows.(20 to 22)
          Commit Trans2.
          Commit Trans1.
          Rows in talbe AItems after above execution : (0 to 6 , 10 to 12 , 20 to 22) = 13 rows in total

          If only Trans1 has to win , then the results should be ( 0 to 6 ,10 to 12 = 10 in total) rather than 13.

          So the results in AItems are different even though the same code is executed.

          >> You mentioned "The last one to commit wins."
          When the data in both the transactions are not interfering with each other then both the transactions has to win , isn't it ?

          Please correct me if I am loosing the track.

          Thanks,
          Ravi.

          Show
          Ravi P Palacherla added a comment - Hi Mike, In my example, data dealt with Trans1 and Trans2 is not interfering with each other. Both Trans1 and Trans2 are working on different objects and different rows of table. Hence I fail to understand what you meant by "This patch demonstrates a race condition more than anything else." The whole idea of my testcase is to demonstrate that the results in database are different if I execute same code but the number of rows updates are different. A Item has 10 rows. Trans1 --> deletes 10 rows. (0 to 9) Adds 10 rows.(10 to 19) Trans2 --> Adds 10 rows.(20 to 29) Commit Trans2. Commit Trans1. Rows in talbe AItems after above execution : ( 10 to 19) = 10 rows in total. The same code when I manipulate only 3 rows. A Item has 10 rows. Trans1 --> deletes 3 rows. (0 to 6) Adds 3 rows.(10 to 12) Trans2 --> Adds 3 rows.(20 to 22) Commit Trans2. Commit Trans1. Rows in talbe AItems after above execution : (0 to 6 , 10 to 12 , 20 to 22) = 13 rows in total If only Trans1 has to win , then the results should be ( 0 to 6 ,10 to 12 = 10 in total) rather than 13. So the results in AItems are different even though the same code is executed. >> You mentioned "The last one to commit wins." When the data in both the transactions are not interfering with each other then both the transactions has to win , isn't it ? Please correct me if I am loosing the track. Thanks, Ravi.
          Hide
          Michael Dick added a comment -

          Hi Ravi,

          This patch demonstrates a race condition more than anything else. You have two transactions. Tran 1 gets a copy of the entity, removes everything from its collection, then adds ten more items. Tran 2 gets a copy of the entity and just adds ten items. To show this just print out the size of newA.getAItems() before committing tran 1.

          In Tran 1 the entity has only 10 items , in tran 2 the entity has 20. The last one to commit wins. It's a unidirectional relationship - I'm guessing this is because it's a uni-directional relationship and A is the owner (even though the updates are in the AItem table) and therefore the state in A trumps the state in AItem.

          Interestingly enough if you updated any field in A you should see an OptimisticLockException which would explain the problem better.

          Show
          Michael Dick added a comment - Hi Ravi, This patch demonstrates a race condition more than anything else. You have two transactions. Tran 1 gets a copy of the entity, removes everything from its collection, then adds ten more items. Tran 2 gets a copy of the entity and just adds ten items. To show this just print out the size of newA.getAItems() before committing tran 1. In Tran 1 the entity has only 10 items , in tran 2 the entity has 20. The last one to commit wins. It's a unidirectional relationship - I'm guessing this is because it's a uni-directional relationship and A is the owner (even though the updates are in the AItem table) and therefore the state in A trumps the state in AItem. Interestingly enough if you updated any field in A you should see an OptimisticLockException which would explain the problem better.
          Hide
          Michael Dick added a comment -

          Hi Ravi,

          Thanks for the patch. I think this sort of change needs a corresponding configuration option in the Compatibility class (maybe called limitChangesTracked?). The javadoc for org.apache.openjpa.conf.Compatibility can contain the information on when we changed the default behavior and why we made the change. If that isn't enough to go on just let me know and I'll provide some more pointers.

          I think that changing the default is the correct way to go, but I can also see existing applications that are unknowingly coded to the old behavior so I'd like to give them an option to move forward. I'll take a closer look at the testcase tomorrow (might be evening) but at first glance the patch looks good.

          Thanks very much for finding and debugging this.

          Show
          Michael Dick added a comment - Hi Ravi, Thanks for the patch. I think this sort of change needs a corresponding configuration option in the Compatibility class (maybe called limitChangesTracked?). The javadoc for org.apache.openjpa.conf.Compatibility can contain the information on when we changed the default behavior and why we made the change. If that isn't enough to go on just let me know and I'll provide some more pointers. I think that changing the default is the correct way to go, but I can also see existing applications that are unknowingly coded to the old behavior so I'd like to give them an option to move forward. I'll take a closer look at the testcase tomorrow (might be evening) but at first glance the patch looks good. Thanks very much for finding and debugging this.
          Hide
          Ravi P Palacherla added a comment -

          Attached is a test case and fix for the issue.

          Show
          Ravi P Palacherla added a comment - Attached is a test case and fix for the issue.

            People

            • Assignee:
              Ravi P Palacherla
              Reporter:
              Ravi P Palacherla
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development