OpenJPA
  1. OpenJPA
  2. OPENJPA-2139

OpenJPA fails to recover from a broken database on startup

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Critical Critical
    • Resolution: Unresolved
    • Affects Version/s: 2.2.0
    • Fix Version/s: 2.4.0
    • Component/s: None
    • Labels:
      None

      Description

      The following scenario:

      1.) turn off the database
      2.) perform a query against the database
      3.) turn on the database
      4.) try to re-run the query from 2.)

      In 4.) you will get the following Exception:

      openjpa-2.2.0-r422266:1244990 nonfatal user error> org.apache.openjpa.persistence.ArgumentException: An error occurred while parsing the query filter "SELECT k FROM DbEnumKey AS k where k.type=:typ ORDER BY k.ordinal". Error message: The name "DbEnumKey" is not a recognized entity or identifier. Known entity names: []

      Basically the whole app is stale afterwards!

      Solution: caching the entities might only be done if a connection can be established.

      1. OPENJPA-2139.mdr.patch
        5 kB
        Rick Curtis
      2. OPENJPA-2139.patch
        3 kB
        Mark Struberg
      3. OPENJPA-2139-1.2.x.patch
        8 kB
        Heath Thomann
      4. OPENJPA-2139-1.2.x-v2.patch
        12 kB
        Heath Thomann

        Activity

        Hide
        Mark Struberg added a comment -

        additional info: all recovers fine if the database once got successfully accessed 1 time and then the db dies and will be restarted.

        Show
        Mark Struberg added a comment - additional info: all recovers fine if the database once got successfully accessed 1 time and then the db dies and will be restarted.
        Hide
        Mark Struberg added a comment -

        please review this patch!

        I found that I need to wait a bit (50ms) because going down to the connector would just trash them again. This is for sure not a perfect solution, if you have a better idea then please speak up!

        Show
        Mark Struberg added a comment - please review this patch! I found that I need to wait a bit (50ms) because going down to the connector would just trash them again. This is for sure not a perfect solution, if you have a better idea then please speak up!
        Hide
        Rick Curtis added a comment -

        Attaching a patch that fixes the root issue.

        Show
        Rick Curtis added a comment - Attaching a patch that fixes the root issue.
        Hide
        Rick Curtis added a comment -

        Background details : The runtime isn't able to determine which type of DB we're running against based off any of the connection properties so we need to make a connection to figure it out.

        The root problem is that when creating the first EM, in AbstractBrokerFactory we have this[1] block of code that gets a MDR instance, and the configures it as a class listener. As part of configuring the MDR, we get a reference to the DBDictionary[2]. While instantiating the dictionary, we need to connect to the DB to figure out which type to create. Instantiation of the dictionary results in an exception, which blows out of the AbstractBrokerFactory.makeReadOnly() method. Thus we never register the MDR as a register class listener, and thus the MDR never gets populated with metadata..

        To work around the problem, I modified the MDR so that we only will instantiate a DBDictionary when we actually use it. Mark tested this patch in his environment and it seemed to work fine.

        [1]
        MetaDataRepository repos = _conf.getMetaDataRepositoryInstance();
        repos.setValidate(MetaDataRepository.VALIDATE_RUNTIME, true);
        repos.setResolve(MetaDataModes.MODE_MAPPING_INIT, true);
        PCRegistry.addRegisterClassListener(repos);

        [2] MappingRepository.endConfiguration()

        Show
        Rick Curtis added a comment - Background details : The runtime isn't able to determine which type of DB we're running against based off any of the connection properties so we need to make a connection to figure it out. The root problem is that when creating the first EM, in AbstractBrokerFactory we have this [1] block of code that gets a MDR instance, and the configures it as a class listener. As part of configuring the MDR, we get a reference to the DBDictionary [2] . While instantiating the dictionary, we need to connect to the DB to figure out which type to create. Instantiation of the dictionary results in an exception, which blows out of the AbstractBrokerFactory.makeReadOnly() method. Thus we never register the MDR as a register class listener, and thus the MDR never gets populated with metadata.. To work around the problem, I modified the MDR so that we only will instantiate a DBDictionary when we actually use it. Mark tested this patch in his environment and it seemed to work fine. [1] MetaDataRepository repos = _conf.getMetaDataRepositoryInstance(); repos.setValidate(MetaDataRepository.VALIDATE_RUNTIME, true); repos.setResolve(MetaDataModes.MODE_MAPPING_INIT, true); PCRegistry.addRegisterClassListener(repos); [2] MappingRepository.endConfiguration()
        Hide
        Mark Struberg added a comment -

        I'm not sure what I did different to yesterdays run, but the patch doesn't seem to solve the problem in all situations.

        If I start my webapp with an offline database and wait until the request dies down completely. Then I start the db and issue a 2nd request -> here I get the original error indicating that my aliases are empty.

        Rick, I fear I forgot to 'undeploy' my own fix prior to testing your patch yesterday

        Show
        Mark Struberg added a comment - I'm not sure what I did different to yesterdays run, but the patch doesn't seem to solve the problem in all situations. If I start my webapp with an offline database and wait until the request dies down completely. Then I start the db and issue a 2nd request -> here I get the original error indicating that my aliases are empty. Rick, I fear I forgot to 'undeploy' my own fix prior to testing your patch yesterday
        Hide
        Rick Curtis added a comment -

        Mark – I was afraid that the patch might fail in a different area. Can you post a stack from when we try to create the first EM? I assume that we get an exception from some other method that AbstractBrokerFactory.makeReadOnly() calls.

        Show
        Rick Curtis added a comment - Mark – I was afraid that the patch might fail in a different area. Can you post a stack from when we try to create the first EM? I assume that we get an exception from some other method that AbstractBrokerFactory.makeReadOnly() calls.
        Hide
        Albert Lee added a comment -

        There is another twisted use case in the JEE environment.

        In the app server container, when the emf is created, it will try to create a transformer attaching to the pu. If the data source connection is still not available, OJ will throw

        openjpa.Runtime: Warn: An error occurred while registering a ClassTransformer with.....

        and any entity loaded by this associated application classloader will NOT enhanced the entity. Eventually, this will yield the same ArgumentException.

        Albert Lee

        Show
        Albert Lee added a comment - There is another twisted use case in the JEE environment. In the app server container, when the emf is created, it will try to create a transformer attaching to the pu. If the data source connection is still not available, OJ will throw openjpa.Runtime: Warn: An error occurred while registering a ClassTransformer with..... and any entity loaded by this associated application classloader will NOT enhanced the entity. Eventually, this will yield the same ArgumentException. Albert Lee
        Hide
        Mark Struberg added a comment -

        That might be the case. In my scenario I'm working with compile time enhancement.

        Show
        Mark Struberg added a comment - That might be the case. In my scenario I'm working with compile time enhancement.
        Hide
        Rick Curtis added a comment -

        Albert -

        In the case that you mentioned, do you know what/where the exception is that we encounter?

        Show
        Rick Curtis added a comment - Albert - In the case that you mentioned, do you know what/where the exception is that we encounter?
        Hide
        Heath Thomann added a comment -

        Hey Rick! Let me try to answer the question you posed previously in response to Albert's comments. If you look at PersistenceProviderImpl.createContainerEntityManagerFactory you can see we have this code:

        try

        { pui.addTransformer(new ClassTransformerImpl(cp, ctOpts, pui.getNewTempClassLoader(), newConfigurationImpl())); }

        catch (Exception e)

        { // fail gracefully transformerException = e; }

        Within the 'addTransformer' call, we are doing a bunch of stuff which includes instantiating a 'ClassTransformerImpl' (CTI). You'll have to take a leap of faith a bit and trust me when I say that as part of instantiating the CTI, a call is eventually made to 'DBDictionaryFactory.newDBDictionary'. Within 'newDBDictionary', a connection to the DB is made. If an exception is thrown when making the connection (e.g. DB is down), the exception flows all the way back to the try/catch in the above code block. That means we never add a transformer......so later at runtime we'd see the ArgumentException given the lack of transformer/enhancement registration. Furthermore, as you can see in the catch block we attempt to "fail gracefully" which means to me in what I see in my env that there isn't much indication of the fact that we didn't register a transformer. So the patch you added previously doesn't work in this case because it does nothing to handle the case where we go down the 'createContainerEntityManagerFactory' path. I'm working on a patch which will "eat" any connection errors at the point 'DBDictionaryFactory.newDBDictionary' is called (i.e. JDBCConfiguratoinImple.getDBDictionaryInstance)....the code will be gated by a system property of course.

        IMHO, it seems OpenJPA needs to take a long hard look at these DB up/down scenarios. In what little testing has been down via this JIRA, we've unearthed a few areas needing to be changed. What other issues could be lurking as further testing is done? That is, could we be opening a can of worms by fixing this limited scenario given OpenJPA doens't appear to do much/any testing in this area (one could think of a million DB up/down scenarios and timing windows)? Anyway, I'm always the pessimist.......

        Thanks,

        Heath Thomann

        Show
        Heath Thomann added a comment - Hey Rick! Let me try to answer the question you posed previously in response to Albert's comments. If you look at PersistenceProviderImpl.createContainerEntityManagerFactory you can see we have this code: try { pui.addTransformer(new ClassTransformerImpl(cp, ctOpts, pui.getNewTempClassLoader(), newConfigurationImpl())); } catch (Exception e) { // fail gracefully transformerException = e; } Within the 'addTransformer' call, we are doing a bunch of stuff which includes instantiating a 'ClassTransformerImpl' (CTI). You'll have to take a leap of faith a bit and trust me when I say that as part of instantiating the CTI, a call is eventually made to 'DBDictionaryFactory.newDBDictionary'. Within 'newDBDictionary', a connection to the DB is made. If an exception is thrown when making the connection (e.g. DB is down), the exception flows all the way back to the try/catch in the above code block. That means we never add a transformer......so later at runtime we'd see the ArgumentException given the lack of transformer/enhancement registration. Furthermore, as you can see in the catch block we attempt to "fail gracefully" which means to me in what I see in my env that there isn't much indication of the fact that we didn't register a transformer. So the patch you added previously doesn't work in this case because it does nothing to handle the case where we go down the 'createContainerEntityManagerFactory' path. I'm working on a patch which will "eat" any connection errors at the point 'DBDictionaryFactory.newDBDictionary' is called (i.e. JDBCConfiguratoinImple.getDBDictionaryInstance)....the code will be gated by a system property of course. IMHO, it seems OpenJPA needs to take a long hard look at these DB up/down scenarios. In what little testing has been down via this JIRA, we've unearthed a few areas needing to be changed. What other issues could be lurking as further testing is done? That is, could we be opening a can of worms by fixing this limited scenario given OpenJPA doens't appear to do much/any testing in this area (one could think of a million DB up/down scenarios and timing windows)? Anyway, I'm always the pessimist....... Thanks, Heath Thomann
        Hide
        Mark Struberg added a comment -

        I've now committed a fix for the original issue. We now reset the _readOnly flag if we hit an Exception during initialisation. That way we will try again until the db is up again.

        This will not yet fix the problem with dynamic enhancement! But since runtime enhancement is mostly considered problematic anyway atm, it will be good enough for most users.
        Shouldn't we create an own Jira issue for the dynamic enhancement issue?

        Show
        Mark Struberg added a comment - I've now committed a fix for the original issue. We now reset the _readOnly flag if we hit an Exception during initialisation. That way we will try again until the db is up again. This will not yet fix the problem with dynamic enhancement! But since runtime enhancement is mostly considered problematic anyway atm, it will be good enough for most users. Shouldn't we create an own Jira issue for the dynamic enhancement issue?
        Hide
        Heath Thomann added a comment -

        Will provide detailed comments about this patch shortly.

        Thanks,

        Heath

        Show
        Heath Thomann added a comment - Will provide detailed comments about this patch shortly. Thanks, Heath
        Hide
        Heath Thomann added a comment -

        Hello Mark! I worked with Albert and Rick on this issue and we came up with a slightly different change than yours but I think it is effectively the same idea. Please review the patch I just uploaded, it is named 'OPENJPA-2139-1.2.x.patch'. First, I used the OPENJPA-2139.mdr.patch which Rick provided a while back. This did not fix some of the issues Albert and I described in latter posts to this JIRA (i.e. registration of a transformer). So our approach was to catch, and eat, the exception at the point we actually attempt to connect to the DB (as well as log a warning message)......specifically JDBCConfigurationImpl.getDBDictionaryInstance. We took the approach that there is no need to allow a connection/SQLException to percolate up the call chain.....we figured it best to catch the exception, eat it, and warn the user of the connection issue. In this way, and with Rick's previous patch, an attempt to connect to the DB can be made at a later time (when hopefully the DB is back up). I noticed that you changed AbstractBrokerFactory.makeReadOnly.......this method eventually causes (indirectly) JDBCConfigurationImpl.getDBDictionaryInstance to be called....when the DB is down getDBDictionaryInstance will cause an exception to be thrown. It appears the 'catch' block you added to AbstractBrokerFactory.makeReadOnly will catch and re-throw this exception as well as reset some state. With my fix, you will no longer receive an exception in some cases (i.e. for the DB connection case where we catch/eat the exception). I would like you to try my patch in your testing environment to see if it fixes your issue. At the very least I'd like you to describe your scenario so we can understand why you made the changes you made. Our fix has been test in a JEE environment whereas I believe you are in a JSE env, right? I've tested a few scenarios in my JEE server where I've cycled the DB at various times and our fix handles these cases as we'd expect. Furthermore, our fix has been tested by an internal group (who uses our JEE server) which is performing some very rigorous DB fail over scenarios and our change fixes their issues. So, I think our fix is necessary, but the remaining question is whether or not your fix is needed in addition to ours.

        Thanks,

        Heath

        Show
        Heath Thomann added a comment - Hello Mark! I worked with Albert and Rick on this issue and we came up with a slightly different change than yours but I think it is effectively the same idea. Please review the patch I just uploaded, it is named ' OPENJPA-2139 -1.2.x.patch'. First, I used the OPENJPA-2139 .mdr.patch which Rick provided a while back. This did not fix some of the issues Albert and I described in latter posts to this JIRA (i.e. registration of a transformer). So our approach was to catch, and eat, the exception at the point we actually attempt to connect to the DB (as well as log a warning message)......specifically JDBCConfigurationImpl.getDBDictionaryInstance. We took the approach that there is no need to allow a connection/SQLException to percolate up the call chain.....we figured it best to catch the exception, eat it, and warn the user of the connection issue. In this way, and with Rick's previous patch, an attempt to connect to the DB can be made at a later time (when hopefully the DB is back up). I noticed that you changed AbstractBrokerFactory.makeReadOnly.......this method eventually causes (indirectly) JDBCConfigurationImpl.getDBDictionaryInstance to be called....when the DB is down getDBDictionaryInstance will cause an exception to be thrown. It appears the 'catch' block you added to AbstractBrokerFactory.makeReadOnly will catch and re-throw this exception as well as reset some state. With my fix, you will no longer receive an exception in some cases (i.e. for the DB connection case where we catch/eat the exception). I would like you to try my patch in your testing environment to see if it fixes your issue. At the very least I'd like you to describe your scenario so we can understand why you made the changes you made. Our fix has been test in a JEE environment whereas I believe you are in a JSE env, right? I've tested a few scenarios in my JEE server where I've cycled the DB at various times and our fix handles these cases as we'd expect. Furthermore, our fix has been tested by an internal group (who uses our JEE server) which is performing some very rigorous DB fail over scenarios and our change fixes their issues. So, I think our fix is necessary, but the remaining question is whether or not your fix is needed in addition to ours. Thanks, Heath
        Hide
        Mark Struberg added a comment -

        Hi Heath!

        Txs for doing a review and provide additional work. I just came back from a conference and hope I can test your patch early next week. Just looked at your patch quickly but need to take more time to grok the details.

        Just to be sure that we aim into the same direction:

        a.) aborting the app in case a DB is not available is NOT an option. It is a standard case in big apps that the db is not always available. E.g on cluster failover (cold standby) the app needs to continue after the db is back up again.

        b.) there are quite a few different scenarios we need to take care
        1.) build-time enhancement
        2.) runtime enhancement
        3.) subclassing enhancement
        4.) configuration via PersistenceUnitInfo vs persistence.xml
        etc. Please add other testing scenarios where this could have an impact. In our app we use build-time enhancement.

        c.) I'm not sure if connecting-for-dictionary-failed is practicable. It's imo not if you have to deal with multiple databases depending on the installation.

        d.) we must be careful to not introduce a chicken-egg problem

        Show
        Mark Struberg added a comment - Hi Heath! Txs for doing a review and provide additional work. I just came back from a conference and hope I can test your patch early next week. Just looked at your patch quickly but need to take more time to grok the details. Just to be sure that we aim into the same direction: a.) aborting the app in case a DB is not available is NOT an option. It is a standard case in big apps that the db is not always available. E.g on cluster failover (cold standby) the app needs to continue after the db is back up again. b.) there are quite a few different scenarios we need to take care 1.) build-time enhancement 2.) runtime enhancement 3.) subclassing enhancement 4.) configuration via PersistenceUnitInfo vs persistence.xml etc. Please add other testing scenarios where this could have an impact. In our app we use build-time enhancement. c.) I'm not sure if connecting-for-dictionary-failed is practicable. It's imo not if you have to deal with multiple databases depending on the installation. d.) we must be careful to not introduce a chicken-egg problem
        Hide
        Heath Thomann added a comment -

        Hi Mark! Just wanted to add a quick update here. Rick reviewed my patch and yours (your committed changes to trunk that is). He feels that your fix/commit is definitely necessary in addition to a portion of the last patch I submitted. The changes to MappingRepository in my previous patch are necessary. However, he raised a very valid point that my eating of an exception in 'JDBCConfigurationImpl.getDBDictionaryInstance' would allow a null DB instance to be returned where an exception used to be returned.......there are a lot of callers to JDBCConfigurationImpl.getDBDictionaryInstance throughout OpenJPA code and as such we can't be sure each caller accounts for a null DB instance to be returned. However, when I test with your commit plus the changes to MappingRepository, it still does not allow us to register a transformer because as part of the registration, we go down the path to JDBCConfigurationImpl.getDBDictionaryInstance which yields an exception when the DB is down, thus cause the registration to not occur. We have some ideas on a modified fix which will consist of following a similar pattern in MappingDefaultsImpl as was done in MappingRepository (i.e. delay getting a reference to DBDictionary). I will work on a fix today and hopefully have an updated patch later today (the patch will include your changes).

        Show
        Heath Thomann added a comment - Hi Mark! Just wanted to add a quick update here. Rick reviewed my patch and yours (your committed changes to trunk that is). He feels that your fix/commit is definitely necessary in addition to a portion of the last patch I submitted. The changes to MappingRepository in my previous patch are necessary. However, he raised a very valid point that my eating of an exception in 'JDBCConfigurationImpl.getDBDictionaryInstance' would allow a null DB instance to be returned where an exception used to be returned.......there are a lot of callers to JDBCConfigurationImpl.getDBDictionaryInstance throughout OpenJPA code and as such we can't be sure each caller accounts for a null DB instance to be returned. However, when I test with your commit plus the changes to MappingRepository, it still does not allow us to register a transformer because as part of the registration, we go down the path to JDBCConfigurationImpl.getDBDictionaryInstance which yields an exception when the DB is down, thus cause the registration to not occur. We have some ideas on a modified fix which will consist of following a similar pattern in MappingDefaultsImpl as was done in MappingRepository (i.e. delay getting a reference to DBDictionary). I will work on a fix today and hopefully have an updated patch later today (the patch will include your changes).
        Hide
        Heath Thomann added a comment -

        Providing a second version of my previous patch. We've taken a slightly different approach in this version of the fix in that we will no longer eat an SQLException caused in 'JDBCConfigurationImpl.getDBDictionaryInstance'. Rather a similar pattern was followed in MappingDefaultsImpl as was done in MappingRepository (i.e. delay getting a reference to DBDictionary).

        Show
        Heath Thomann added a comment - Providing a second version of my previous patch. We've taken a slightly different approach in this version of the fix in that we will no longer eat an SQLException caused in 'JDBCConfigurationImpl.getDBDictionaryInstance'. Rather a similar pattern was followed in MappingDefaultsImpl as was done in MappingRepository (i.e. delay getting a reference to DBDictionary).
        Hide
        Mark Struberg added a comment -

        Hi Heath, did you apply the patch already? I'm now finally back on working on this stuff again. Any updates?

        Show
        Mark Struberg added a comment - Hi Heath, did you apply the patch already? I'm now finally back on working on this stuff again. Any updates?
        Hide
        Mark Struberg added a comment -

        PS: your IDE seems to add TABS! Please switch your editor to SPACES ONLY.

        It is generally a good idea to switch your IDE to 'show whitespaces' for all Apache projects. Otherwise we will end up with a completely messed tabs/spaces mixture.

        Show
        Mark Struberg added a comment - PS: your IDE seems to add TABS! Please switch your editor to SPACES ONLY. It is generally a good idea to switch your IDE to 'show whitespaces' for all Apache projects. Otherwise we will end up with a completely messed tabs/spaces mixture.
        Hide
        Mark Struberg added a comment -

        another note:
        public DBDictionary getDBDictionary() {
        if (dict == null)

        { dict = ((JDBCConfiguration) conf).getDBDictionaryInstance(); }

        ...

        do we need some synchronisation or DCL here? I've not yet checked if this could happen concurrently. Anyone knows this?

        Show
        Mark Struberg added a comment - another note: public DBDictionary getDBDictionary() { if (dict == null) { dict = ((JDBCConfiguration) conf).getDBDictionaryInstance(); } ... do we need some synchronisation or DCL here? I've not yet checked if this could happen concurrently. Anyone knows this?
        Hide
        Mark Struberg added a comment -

        I tried to apply the patch manually, but it just doesn't fit. Against which branch did you do the changes? Please note that I did a quick commit removing some unused imports and fix a few JavaDocs on the MappingDefaultsImpl. But the patch also didn't apply before.

        Show
        Mark Struberg added a comment - I tried to apply the patch manually, but it just doesn't fit. Against which branch did you do the changes? Please note that I did a quick commit removing some unused imports and fix a few JavaDocs on the MappingDefaultsImpl. But the patch also didn't apply before.
        Hide
        Mark Struberg added a comment -

        Heath, any update on this?

        Show
        Mark Struberg added a comment - Heath, any update on this?

          People

          • Assignee:
            Mark Struberg
            Reporter:
            Mark Struberg
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:

              Development