OpenJPA
  1. OpenJPA
  2. OPENJPA-1873

EntityManager#merge sometimes passes wrong entity values to @PostLoad EntityListeners

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 2.0.0, 2.0.1, 2.0.2
    • Fix Version/s: 2.2.0
    • Component/s: kernel
    • Labels:
      None

      Description

      I've tested this with the latest from branches/2.0.x.

      My entity has an @EntityListeners which observes the @PostLoad lifecycle event. This listener stores the 'old' values from the database for later use (see http://struberg.wordpress.com/2010/07/31/howto-changelog-with-jpa/ for the intention behind). All works well if the table has only a few rows. But if you add more rows, OpenJPA tries to optimize the access and only loads the @Version field + the dirty fields. In this case the merging seems to be wrong, because I get the NEW values from the dirty fields instead of the original values from the database passed to my @PostLoad method.

      Did cost me a few grey hairs to track down the differences between the working and the broken scenarios here But finally I was able to creat a unit test showing the problem

      1. postloadtest.zip
        11 kB
        Mark Struberg
      2. OPENJPA-1873-unittest.patch
        9 kB
        Mark Struberg
      3. openjpa-1873-hack-1.patch
        3 kB
        Rick Curtis
      4. OPENJPA-1873-fix3.patch
        27 kB
        Mark Struberg
      5. OPENJPA-1873-fix2.patch
        24 kB
        Mark Struberg
      6. OPENJPA-1873-fix1.patch
        17 kB
        Mark Struberg

        Activity

        Hide
        Albert Lee added a comment -

        Close issue in preparation for 2.2.0 release.

        Show
        Albert Lee added a comment - Close issue in preparation for 2.2.0 release.
        Hide
        Mark Struberg added a comment -

        Thanks for the review Kevin!

        Indeed, the problem I faced for a long time was that PostLoad was pretty much broken, because the payload entity you get into this event was always different, depending on the situation.

        I agree with what Rick and Pinaki pointed out above, that the spec is not really clear if PostLoad should get fired for a merge at all. But OpenJPA actually did fire it in some cases, and the entity which got handed to the event was a mixture between the entity in memory, the state on the database and the lazy-loaded delta. Again: it was not one of those 3, but it was a weird mixture with some fields nulled out, etc. And it was depending on _loaded, _dirty and the detached state. So basically the PostLoad event was not usable for users once the entity got detached.

        Because of that, I would be in favour to even make PostLoadOnMerge enabled by default - but that's a policy decision which is not up to me to decide. Please note that the behaviour only got changed if there is any PostLoad lifecycle listener registered on the entity at all!
        With the PostLoadOnMerge flag enabled, OpenJPA now behaves the same way as I saw with latest Hibernate and EclipseLink - I actually don't care much about those other impls, but I think it's at least not a bad thing

        I was not really sure about the String property based configuration, so I decided to go full route. If you say this is not supported anymore in openjpa-2.x, then we should think about Deprecating those parts at all?

        Regarding the documentation - thanks for the pointer! I thought those get generated out from the JavaDoc. Will fix those parts in the doc by adding the property.

        Show
        Mark Struberg added a comment - Thanks for the review Kevin! Indeed, the problem I faced for a long time was that PostLoad was pretty much broken, because the payload entity you get into this event was always different, depending on the situation. I agree with what Rick and Pinaki pointed out above, that the spec is not really clear if PostLoad should get fired for a merge at all. But OpenJPA actually did fire it in some cases, and the entity which got handed to the event was a mixture between the entity in memory, the state on the database and the lazy-loaded delta. Again: it was not one of those 3, but it was a weird mixture with some fields nulled out, etc. And it was depending on _loaded, _dirty and the detached state. So basically the PostLoad event was not usable for users once the entity got detached. Because of that, I would be in favour to even make PostLoadOnMerge enabled by default - but that's a policy decision which is not up to me to decide. Please note that the behaviour only got changed if there is any PostLoad lifecycle listener registered on the entity at all! With the PostLoadOnMerge flag enabled, OpenJPA now behaves the same way as I saw with latest Hibernate and EclipseLink - I actually don't care much about those other impls, but I think it's at least not a bad thing I was not really sure about the String property based configuration, so I decided to go full route. If you say this is not supported anymore in openjpa-2.x, then we should think about Deprecating those parts at all? Regarding the documentation - thanks for the pointer! I thought those get generated out from the JavaDoc. Will fix those parts in the doc by adding the property.
        Hide
        Kevin Sutter added a comment -

        Mark,
        Your patch/commit looks pretty straight forward. The JIRA has such a long history, I was wondering what you finally decided to resolve. It looks like you have provided a new property (openjpa.PostLoadOnMerge) which will tell OpenJPA to call the PostLoad listener on every Merge. If that's accurate, then some documentation updates would be good to have so that users know about this new property. Thanks.

        I also noticed that you updated the OpenJPAConfiguration file to define the OPTION_POSTLOAD_ON_MERGE string. I don't believe these are really even used any more. It looks like maybe this was the old (original) way to specify properties, but the updated plugin configuration is much easier – which you also implemented. Maybe I'm mistaken, but I don't think you need the OPTION_POSTLOAD_ON_MERGE updates. Did you find that you required these?

        And, it looks like you resolved the problem with incorrect data being dispensed to the PostLoad listener on a Merge – iff this new option is set. Is that correct?

        Thanks,
        Kevin

        Show
        Kevin Sutter added a comment - Mark, Your patch/commit looks pretty straight forward. The JIRA has such a long history, I was wondering what you finally decided to resolve. It looks like you have provided a new property (openjpa.PostLoadOnMerge) which will tell OpenJPA to call the PostLoad listener on every Merge. If that's accurate, then some documentation updates would be good to have so that users know about this new property. Thanks. I also noticed that you updated the OpenJPAConfiguration file to define the OPTION_POSTLOAD_ON_MERGE string. I don't believe these are really even used any more. It looks like maybe this was the old (original) way to specify properties, but the updated plugin configuration is much easier – which you also implemented. Maybe I'm mistaken, but I don't think you need the OPTION_POSTLOAD_ON_MERGE updates. Did you find that you required these? And, it looks like you resolved the problem with incorrect data being dispensed to the PostLoad listener on a Merge – iff this new option is set. Is that correct? Thanks, Kevin
        Hide
        Mark Struberg added a comment -

        Hi!

        I've now updated my patch to apply cleanly to the latest trunk!

        This is tested in 2 big real world projects and works fine now since a year. This is really an important issue, because currently the POST_LOAD lifecycle listeners are really broken. They e.g. currently also being called for parts which are lazily loaded - and in this case the lifecycle listener will get completely wrong data handed over!

        My patch is non-invasive because the 'fixed' handling will only be used if POST_LOAD_ON_MERGE gets enabled in persistence.xml.

        If noone objects, I'll go on and commit the patch tomorrow.

        Show
        Mark Struberg added a comment - Hi! I've now updated my patch to apply cleanly to the latest trunk! This is tested in 2 big real world projects and works fine now since a year. This is really an important issue, because currently the POST_LOAD lifecycle listeners are really broken. They e.g. currently also being called for parts which are lazily loaded - and in this case the lifecycle listener will get completely wrong data handed over! My patch is non-invasive because the 'fixed' handling will only be used if POST_LOAD_ON_MERGE gets enabled in persistence.xml. If noone objects, I'll go on and commit the patch tomorrow.
        Hide
        Mark Struberg added a comment -

        and here comes the next one, this time with a new 'openjpa.PostLoadOnMerge' configuration property.

        Tests are currently running....

        LieGrue,
        strub

        Show
        Mark Struberg added a comment - and here comes the next one, this time with a new 'openjpa.PostLoadOnMerge' configuration property. Tests are currently running.... LieGrue, strub
        Hide
        Mark Struberg added a comment -

        one thing: we should possibly suppress firing of the PostLoad event if postLoadOnMerge == false when reading the dirty attributes from the db.
        Because this still gets fired if all attributes have been dirty in the detached entity. Currently you sometimes get the @PostLoad called on merge, sometimes not ...

        Show
        Mark Struberg added a comment - one thing: we should possibly suppress firing of the PostLoad event if postLoadOnMerge == false when reading the dirty attributes from the db. Because this still gets fired if all attributes have been dirty in the detached entity. Currently you sometimes get the @PostLoad called on merge, sometimes not ...
        Hide
        Mark Struberg added a comment -

        Hi Rick!

        Here is my first take on the problem based on your initial suggestion.

        What do you think about introducing a configuration parameter like openjpa.PostLoadOnMerge ?

        There is a line in the patch marking the right spot:
        boolean postLoadOnMerge = true; //X make this configurable

        It seems to run fine in my real world application, but I just need to start the full OpenJPA compile with tests enabled over the night. So possibly still something left to fix...

        Show
        Mark Struberg added a comment - Hi Rick! Here is my first take on the problem based on your initial suggestion. What do you think about introducing a configuration parameter like openjpa.PostLoadOnMerge ? There is a line in the patch marking the right spot: boolean postLoadOnMerge = true; //X make this configurable It seems to run fine in my real world application, but I just need to start the full OpenJPA compile with tests enabled over the night. So possibly still something left to fix...
        Hide
        Rick Curtis added a comment -

        Attaching a hack for Mark to play around with.

        Show
        Rick Curtis added a comment - Attaching a hack for Mark to play around with.
        Hide
        Mark Struberg added a comment -

        Hi!

        This is just a small summary about the discussions I had with Rick at our yesterdays debugging session.

        It seems that the DetachStateManager is almost in the original state as imported by Patrick in early 2006 and might not reflect all things needed for JPA2.

        The attach() in line #108 is of special interrest. It consists of 3 parts:

        1.) (line #131 - #166) prepare old Values from the database to be able to restore those values in case of a rollback.
        comment: Imo this is NOT needed if openjpa.RestoreState is RESTORE_NONE!

        2.) (handled in the same block somehow) if optimistic locking is used (does the whole merging make any sense for row-locking at all?): Check the version in the database to make sure that we didn't have a concurrent modification already.

        3.) (line #168 - #295) perform the actual 'merging'. by setting the values from the toAttach entity into the one from the DB.

        so far so good.

        Now to the problematic spots.

        Depending on the situation (_loaded and _dirty states) 1.) xor 3.) fire the @PostLoad event. The logic for the lifecyle event is to get fired if all fields (or all fields of a FetchGroup if fgs are used) got set. If all fields have been made dirty in the detached entity then this will happen in 1.) If not, it will happen in 3.). The problem here is that in 1.) the entity from the DB is not complete and in 3.) has nothing to do with any load from the database, since it contains the dirty values from the toAttach entity.

        It's really problematic that both the load() from the database and the attach() use the same StateManagerImpl#storeStringField() without any distinction. So those highly different tasks will both fire the same @PostLoad lifecycle event. Imo this must not happen while attaching.

        Rick, are you d' accord so far?

        Show
        Mark Struberg added a comment - Hi! This is just a small summary about the discussions I had with Rick at our yesterdays debugging session. It seems that the DetachStateManager is almost in the original state as imported by Patrick in early 2006 and might not reflect all things needed for JPA2. The attach() in line #108 is of special interrest. It consists of 3 parts: 1.) (line #131 - #166) prepare old Values from the database to be able to restore those values in case of a rollback. comment: Imo this is NOT needed if openjpa.RestoreState is RESTORE_NONE! 2.) (handled in the same block somehow) if optimistic locking is used (does the whole merging make any sense for row-locking at all?): Check the version in the database to make sure that we didn't have a concurrent modification already. 3.) (line #168 - #295) perform the actual 'merging'. by setting the values from the toAttach entity into the one from the DB. so far so good. Now to the problematic spots. Depending on the situation (_loaded and _dirty states) 1.) xor 3.) fire the @PostLoad event. The logic for the lifecyle event is to get fired if all fields (or all fields of a FetchGroup if fgs are used) got set. If all fields have been made dirty in the detached entity then this will happen in 1.) If not, it will happen in 3.). The problem here is that in 1.) the entity from the DB is not complete and in 3.) has nothing to do with any load from the database, since it contains the dirty values from the toAttach entity. It's really problematic that both the load() from the database and the attach() use the same StateManagerImpl#storeStringField() without any distinction. So those highly different tasks will both fire the same @PostLoad lifecycle event. Imo this must not happen while attaching. Rick, are you d' accord so far?
        Hide
        Mark Struberg added a comment -

        The bug doesn't appear if I use

        <property name="openjpa.DetachState" value="fetch-groups"/>

        So this is definitely a bug imo, because OpenJPA must return the same results regardless which detach strategy one uses.

        Show
        Mark Struberg added a comment - The bug doesn't appear if I use <property name="openjpa.DetachState" value="fetch-groups"/> So this is definitely a bug imo, because OpenJPA must return the same results regardless which detach strategy one uses.
        Hide
        Mark Struberg added a comment -

        Hi Pinaki!

        The test contains a test entity PostLoadListenerEntity.
        This entity contains a field 'value' which I use in the test.
        If you add ONE single other value to the entity, the Object in the @PostLoad listener has a different value.

        Please comment out value2 to value12 in this entity (only id and value enabled) and rerun the test - it will pass now.
        Then comment value2 or any other additional field back in -> the test will fail.
        This is really inconsistent and should get fixed

        Btw I do not agree with you on the value of the @PostLoad data.
        The spec says:
        3.5.2
        "The PostLoad method for an entity is invoked after the entity has been loaded into the current persis-
        tence context from the database or after the refresh operation has been applied to it."

        and for the merge:
        3.1.1 "Merge the state of the given entity into the current persistence context."
        3.2.7.1 "The merge operation allows for the propagation of state from detached entities onto persistent entities managed by the entity manager." + "If X is a detached entity, the state of X is copied onto a pre-existing managed entity instance X' of the same identity or a new managed copy X' of X is created."

        I interpret this that the value we need to hand to the @PostLoad method ist per spec the state of the entity before the merge.
        The only thing we imo can take into question is if the @PostLoad needs to get called at all. Section 3.2.7 at least gives a few answers where we for sure do not need to load it - but 3.2.7.1 imo says that we should for merging detached entities:

        3.2 "A managed entity instance is an instance with a persistent identity that is currently associated with a persistence context."

        Of course you can say we don't need to go to the database to ensure the 'managed entity' has a 'persistent identity' but then you will fail if the database has been changed outside your application. Thus a load from the database is really required.

        I know that we only need to do what the spec says and not what any other JPA implementation does. But if 3 other major JPA impls act this way, than it might be a strong indicator at least

        txs and LieGrue,
        strub

        Show
        Mark Struberg added a comment - Hi Pinaki! The test contains a test entity PostLoadListenerEntity. This entity contains a field 'value' which I use in the test. If you add ONE single other value to the entity, the Object in the @PostLoad listener has a different value. Please comment out value2 to value12 in this entity (only id and value enabled) and rerun the test - it will pass now. Then comment value2 or any other additional field back in -> the test will fail. This is really inconsistent and should get fixed Btw I do not agree with you on the value of the @PostLoad data. The spec says: 3.5.2 "The PostLoad method for an entity is invoked after the entity has been loaded into the current persis- tence context from the database or after the refresh operation has been applied to it." and for the merge: 3.1.1 "Merge the state of the given entity into the current persistence context." 3.2.7.1 "The merge operation allows for the propagation of state from detached entities onto persistent entities managed by the entity manager." + "If X is a detached entity, the state of X is copied onto a pre-existing managed entity instance X' of the same identity or a new managed copy X' of X is created." I interpret this that the value we need to hand to the @PostLoad method ist per spec the state of the entity before the merge. The only thing we imo can take into question is if the @PostLoad needs to get called at all. Section 3.2.7 at least gives a few answers where we for sure do not need to load it - but 3.2.7.1 imo says that we should for merging detached entities: 3.2 "A managed entity instance is an instance with a persistent identity that is currently associated with a persistence context." Of course you can say we don't need to go to the database to ensure the 'managed entity' has a 'persistent identity' but then you will fail if the database has been changed outside your application. Thus a load from the database is really required. I know that we only need to do what the spec says and not what any other JPA implementation does. But if 3 other major JPA impls act this way, than it might be a strong indicator at least txs and LieGrue, strub
        Hide
        Pinaki Poddar added a comment -

        Mark,
        1. is the attached test in PostLoadTest.zip supposed to pass or fail?

        you wrote:
        > Fact: a @PostLoad must get called after loading from the database. Thus I'd say that it needs to get called even if entities only got loaded partly.
        I tend of agree. The notion of 'loaded partly' does not exist in the spce – it is an implementation issue.
        However, my view is that the lifecycle callback methods are tied to instance life cycle state transitions and not directly to operational methods such as find() or merge(). Of course, an operational method may cause a life cycle state transition. For example, a PostLoad callback is invoked when an entity state is loaded from the database. But whether a merge() operation will cause an entity state be loaded from database or not can vary based on other factors.
        Hence a question such as "does merge() must invoke postLoad()?" may not offer a definite answer. "Vendor X invoked postLoad() on merge()" – is not convincing enough for me.

        > But the spec also defines that the entity from the db needs to get posted.

        Can you please elaborate this point? Are you suggesting that the state of the argument instance of postLoad() method should represent the database values? If that is the case, then I do not agree. The postLoad() will see the state of the instance after the provider has loaded the instance in to its context – some of the properties can come from the database, some can come from the user-supplied instance – but the state as presented to the postLoad() method argument must be 'consistent' – meaning it must be the exact same state that the persistence context holds and will eventually be committed if nothing changes further.

        > Currently we get a mix between old and new values in @PostLoad which (I think we agree) is wrong.

        Such a mix, if proven, is inconsistent. But I could not see how the test you have submitted exposes such inconsistent behavior.

        Show
        Pinaki Poddar added a comment - Mark, 1. is the attached test in PostLoadTest.zip supposed to pass or fail? you wrote: > Fact: a @PostLoad must get called after loading from the database. Thus I'd say that it needs to get called even if entities only got loaded partly. I tend of agree. The notion of 'loaded partly' does not exist in the spce – it is an implementation issue. However, my view is that the lifecycle callback methods are tied to instance life cycle state transitions and not directly to operational methods such as find() or merge(). Of course, an operational method may cause a life cycle state transition. For example, a PostLoad callback is invoked when an entity state is loaded from the database. But whether a merge() operation will cause an entity state be loaded from database or not can vary based on other factors. Hence a question such as "does merge() must invoke postLoad()?" may not offer a definite answer. "Vendor X invoked postLoad() on merge()" – is not convincing enough for me. > But the spec also defines that the entity from the db needs to get posted. Can you please elaborate this point? Are you suggesting that the state of the argument instance of postLoad() method should represent the database values? If that is the case, then I do not agree. The postLoad() will see the state of the instance after the provider has loaded the instance in to its context – some of the properties can come from the database, some can come from the user-supplied instance – but the state as presented to the postLoad() method argument must be 'consistent' – meaning it must be the exact same state that the persistence context holds and will eventually be committed if nothing changes further. > Currently we get a mix between old and new values in @PostLoad which (I think we agree) is wrong. Such a mix, if proven, is inconsistent. But I could not see how the test you have submitted exposes such inconsistent behavior.
        Hide
        Rick Curtis added a comment -

        > Do we have someone representing the ASF on this JSR in the EG?
        Yes Kevin and Pinaki are both on the EG. I'll get one of them to comment on this issue early next week.

        > I'd rather like to get this confirmed before we implement it the wrong way.
        Agreed.

        Show
        Rick Curtis added a comment - > Do we have someone representing the ASF on this JSR in the EG? Yes Kevin and Pinaki are both on the EG. I'll get one of them to comment on this issue early next week. > I'd rather like to get this confirmed before we implement it the wrong way. Agreed.
        Hide
        Mark Struberg added a comment -

        Do we have someone representing the ASF on this JSR in the EG? If not I could ping a few guys on the EG or even ask on the eclipselink list.
        I'd rather like to get this confirmed before we implement it the wrong way.

        Show
        Mark Struberg added a comment - Do we have someone representing the ASF on this JSR in the EG? If not I could ping a few guys on the EG or even ask on the eclipselink list. I'd rather like to get this confirmed before we implement it the wrong way.
        Hide
        Rick Curtis added a comment -

        > I'm very interested why OpenJPA currently loads the dirty fields from the database (in a manner which in the logs looks like it's doing multiple database interactions!) instead only loading the version (if present).
        It sounds like a bug.

        > a @PostLoad must get called after loading from the database.
        The spec says that @PostLoad must get called after data gets loaded from the database INTO the persistence context. If the provider is loading from the DB, but it isn't going into the context, I don't think we need to do the callback.

        Also, since OpenJPA has this smart DetachedStateManager, we shouldn't need to load anything from the database on merge. We currently do (as noted above) but I think we have a bug.

        Show
        Rick Curtis added a comment - > I'm very interested why OpenJPA currently loads the dirty fields from the database (in a manner which in the logs looks like it's doing multiple database interactions!) instead only loading the version (if present). It sounds like a bug. > a @PostLoad must get called after loading from the database. The spec says that @PostLoad must get called after data gets loaded from the database INTO the persistence context. If the provider is loading from the DB, but it isn't going into the context, I don't think we need to do the callback. Also, since OpenJPA has this smart DetachedStateManager, we shouldn't need to load anything from the database on merge. We currently do (as noted above) but I think we have a bug.
        Hide
        Mark Struberg added a comment -

        I'm very interested why OpenJPA currently loads the dirty fields from the database (in a manner which in the logs looks like it's doing multiple database interactions!) instead only loading the version (if present).

        Fact: a @PostLoad must get called after loading from the database. Thus I'd say that it needs to get called even if entities only got loaded partly. But the spec also defines that the entity from the db needs to get posted. Currently we get a mix between old and new values in @PostLoad which (I think we agree) is wrong.

        Also quoting from the spec posted above:
        "In the absence of Version columns there is no additional version checking done by the persistence provider runtime during the merge operation."

        I interpret is that in reverse it should be checked on #merge if an @Version is present.

        Show
        Mark Struberg added a comment - I'm very interested why OpenJPA currently loads the dirty fields from the database (in a manner which in the logs looks like it's doing multiple database interactions!) instead only loading the version (if present). Fact: a @PostLoad must get called after loading from the database. Thus I'd say that it needs to get called even if entities only got loaded partly. But the spec also defines that the entity from the db needs to get posted. Currently we get a mix between old and new values in @PostLoad which (I think we agree) is wrong. Also quoting from the spec posted above: "In the absence of Version columns there is no additional version checking done by the persistence provider runtime during the merge operation." I interpret is that in reverse it should be checked on #merge if an @Version is present.
        Hide
        Rick Curtis added a comment -

        @Mark -

        Thoughts on Kevin's last post?

        Thanks,
        Rick

        Show
        Rick Curtis added a comment - @Mark - Thoughts on Kevin's last post? Thanks, Rick
        Hide
        Mark Struberg added a comment -

        Hi! Just verified that @PostLoad gets fired in Hibernate. See the attached little maven project.

        Show
        Mark Struberg added a comment - Hi! Just verified that @PostLoad gets fired in Hibernate. See the attached little maven project.
        Hide
        Rick Curtis added a comment -

        > but it seems that EclipseLink and Hibernate both fire the @PostLoad event for a merge.
        Point noted.

        3.5.2 – "The PostLoad method for an entity is invoked after the entity has been loaded into the current persistence
        context from the database or after the refresh operation has been applied to it."

        Most of my argument is based off the part which says "from the database". In the case of this merge, OpenJPA has a DetachedStateManager so we shouldn't need to hit to DB to see which fields are dirty. I think we have a bug here where we hit the DB even though we don't need to.

        I believe that the other providers aren't working per the spec. Upon merge being called, they may need to hit the DB to compare merged fields with those in the DB to see which are dirty/clean... but that data isn't actually loaded into the persistence context. They are loading data for the sake of version checking, not for loading in the context.

        I'll note that Pinaki wrote a blog post[1] quite some time back doing something very similar. It might be worth a read. It looks like the formatting is messed up so let me know if you want the full text.

        Quite the hair splitting for this early in the morning

        [1] http://webspherepersistence.blogspot.com/2009/01/auditing-with-openjpa.html

        Show
        Rick Curtis added a comment - > but it seems that EclipseLink and Hibernate both fire the @PostLoad event for a merge. Point noted. 3.5.2 – "The PostLoad method for an entity is invoked after the entity has been loaded into the current persistence context from the database or after the refresh operation has been applied to it." Most of my argument is based off the part which says "from the database". In the case of this merge, OpenJPA has a DetachedStateManager so we shouldn't need to hit to DB to see which fields are dirty. I think we have a bug here where we hit the DB even though we don't need to. I believe that the other providers aren't working per the spec. Upon merge being called, they may need to hit the DB to compare merged fields with those in the DB to see which are dirty/clean... but that data isn't actually loaded into the persistence context. They are loading data for the sake of version checking, not for loading in the context. I'll note that Pinaki wrote a blog post [1] quite some time back doing something very similar. It might be worth a read. It looks like the formatting is messed up so let me know if you want the full text. Quite the hair splitting for this early in the morning [1] http://webspherepersistence.blogspot.com/2009/01/auditing-with-openjpa.html
        Hide
        Kevin Sutter added a comment -

        Hi guys,
        My two cents worth... I don't think the spec indicates that the @PostLoad should get fired on a merge() operation. I do agree that if the merge() operation requires loading from the database, then the @PostLoad should get fired. But, I don't see in the spec where it says that the merge() operation demands a load from the database (and thus the @PostLoad).

        Section 3.2.7.1 states:

        "If X is a detached entity, the state of X is copied onto a pre-existing managed entity instance X'
        of the same identity or a new managed copy X' of X is created."

        Since OpenJPA utilizes a "detached state manager", that would constitute a pre-existing managed entity instance and there would be no requirement to go to the database. If you think about it, this would be a good thing to avoid extra, unnecessary trips to the database.

        Now, if your detached entity does not use a "detached state manger" [1], then we probably have to go to the database (like the other providers) to load up the current state into a new managed copy and then merge in the updated values. From side conversations with Rick, this processing may not be working quite right, but that's how I am reading the spec and the scenario described here.

        The other interesting piece from 3.2.7.1 is this:

        "Any Version columns used by the entity must be checked by the persistence runtime implementation
        during the merge operation and/or at flush or commit time. In the absence of Version columns there is
        no additional version checking done by the persistence provider runtime during the merge operation."

        Earlier comments by Rick seemed to indicate that the absence of an @Version field would be a reason to access the database during merge(). This paragraph seems to indicate that this version checking should not be done during merge()...

        This looks to be a good problem. I think we have an issue or two to resolve here. We just need to come to a consensus.

        Thanks,
        Kevin

        [1] http://openjpa.apache.org/builds/2.0.1/apache-openjpa-2.0.1/docs/manual/manual.html#ref_guide_detach_state

        Show
        Kevin Sutter added a comment - Hi guys, My two cents worth... I don't think the spec indicates that the @PostLoad should get fired on a merge() operation. I do agree that if the merge() operation requires loading from the database, then the @PostLoad should get fired. But, I don't see in the spec where it says that the merge() operation demands a load from the database (and thus the @PostLoad). Section 3.2.7.1 states: "If X is a detached entity, the state of X is copied onto a pre-existing managed entity instance X' of the same identity or a new managed copy X' of X is created." Since OpenJPA utilizes a "detached state manager", that would constitute a pre-existing managed entity instance and there would be no requirement to go to the database. If you think about it, this would be a good thing to avoid extra, unnecessary trips to the database. Now, if your detached entity does not use a "detached state manger" [1] , then we probably have to go to the database (like the other providers) to load up the current state into a new managed copy and then merge in the updated values. From side conversations with Rick, this processing may not be working quite right, but that's how I am reading the spec and the scenario described here. The other interesting piece from 3.2.7.1 is this: "Any Version columns used by the entity must be checked by the persistence runtime implementation during the merge operation and/or at flush or commit time. In the absence of Version columns there is no additional version checking done by the persistence provider runtime during the merge operation." Earlier comments by Rick seemed to indicate that the absence of an @Version field would be a reason to access the database during merge(). This paragraph seems to indicate that this version checking should not be done during merge()... This looks to be a good problem. I think we have an issue or two to resolve here. We just need to come to a consensus. Thanks, Kevin [1] http://openjpa.apache.org/builds/2.0.1/apache-openjpa-2.0.1/docs/manual/manual.html#ref_guide_detach_state
        Hide
        Mark Struberg added a comment - - edited

        but it seems that EclipseLink and Hibernate both fire the @PostLoad event for a merge.
        And albeit it is not 100% clear from the spec, there is a lot evidence that this behaviour is intended.

        edit: the wording from the PostLoad + merge (3.2.7.1) + 'managed instance' in the spec indicate that @PostLoad should get fired if the merging gets called on a detached entity which exists in the database.

        Show
        Mark Struberg added a comment - - edited but it seems that EclipseLink and Hibernate both fire the @PostLoad event for a merge. And albeit it is not 100% clear from the spec, there is a lot evidence that this behaviour is intended. edit: the wording from the PostLoad + merge (3.2.7.1) + 'managed instance' in the spec indicate that @PostLoad should get fired if the merging gets called on a detached entity which exists in the database.
        Hide
        Rick Curtis added a comment -

        Unfortunately option a.) gets my vote.

        Theoretically we shouldn't be hitting the DB at all for this Entity since we have a DetachedStateManager and we know which fields are dirty... so there isn't a need to load anything.

        Show
        Rick Curtis added a comment - Unfortunately option a.) gets my vote. Theoretically we shouldn't be hitting the DB at all for this Entity since we have a DetachedStateManager and we know which fields are dirty... so there isn't a need to load anything.
        Hide
        Mark Struberg added a comment -

        I did further debugging.

        In the DetachedStateManager #165 (right after the dirty check and reload of all those fields: Object origVersion = sm.getVersion(); ) the sm (StateManagerImpl) contains the correct entity values from the db in _pc!

        of course they get replaced with the values from toAttach before the @PostLoad gets fired.

        There are 2 options now:

        a) dont fire the @PostLoad at all for EM#merge - because the load data in the event is definitely wrong currently
        b) first merge the non-dirty fields back to the sm._pc before firing the @PostLoad event and then do the toAttach merge.

        I'd obviously prefer option b

        Show
        Mark Struberg added a comment - I did further debugging. In the DetachedStateManager #165 (right after the dirty check and reload of all those fields: Object origVersion = sm.getVersion(); ) the sm (StateManagerImpl) contains the correct entity values from the db in _pc! of course they get replaced with the values from toAttach before the @PostLoad gets fired. There are 2 options now: a) dont fire the @PostLoad at all for EM#merge - because the load data in the event is definitely wrong currently b) first merge the non-dirty fields back to the sm._pc before firing the @PostLoad event and then do the toAttach merge. I'd obviously prefer option b
        Hide
        Mark Struberg added a comment -

        Rick, the merge acutally does much more than simply loading the version field. It additionally loads all fields which are marked as 'dirty' from the database.

        From the spec 3.1.1 EntityManager Interface
        Merge the state of the given entity into the
        current persistence context.
        @return the managed instance that the state was merged to

        and 'managed instance' is a well defined term in this spec which afaik means an entity which got loaded from the database.

        Show
        Mark Struberg added a comment - Rick, the merge acutally does much more than simply loading the version field. It additionally loads all fields which are marked as 'dirty' from the database. From the spec 3.1.1 EntityManager Interface Merge the state of the given entity into the current persistence context. @return the managed instance that the state was merged to and 'managed instance' is a well defined term in this spec which afaik means an entity which got loaded from the database.
        Hide
        Rick Curtis added a comment -

        My first thoughts on this are that the merge operation shouldn't result in a postLoad() callback... Nothing from the DB is actually being placed in the persistence context.

        The only reason OpenJPA is hitting the DB is to see if the merged Entity is dirty. In the case where there is no version, we need to hit the DB to compare the dirty fields to those in the DB. In the case where there is a version field, we will only load that field... not the entire Entity.

        Show
        Rick Curtis added a comment - My first thoughts on this are that the merge operation shouldn't result in a postLoad() callback... Nothing from the DB is actually being placed in the persistence context. The only reason OpenJPA is hitting the DB is to see if the merged Entity is dirty. In the case where there is no version, we need to hit the DB to compare the dirty fields to those in the DB. In the case where there is a version field, we will only load that field... not the entire Entity.
        Hide
        Mark Struberg added a comment -

        this unit test shows the problem. If you comment out value2 .. value12 in PostLoadListenerEntity.java, then all works fine.

        In the meantime: is there a config option to disable this optimisation behaviour?

        Show
        Mark Struberg added a comment - this unit test shows the problem. If you comment out value2 .. value12 in PostLoadListenerEntity.java, then all works fine. In the meantime: is there a config option to disable this optimisation behaviour?

          People

          • Assignee:
            Mark Struberg
            Reporter:
            Mark Struberg
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development