OpenJPA
  1. OpenJPA
  2. OPENJPA-464

Performance improvement with Statement Batching support

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.1.0
    • Fix Version/s: 1.1.0
    • Component/s: kernel
    • Labels:
      None

      Description

      The current OpenJPA implementation did not provide the SQL statement batching. All SQL statements will be executed one statement at a time to the database. Consequently, the runtime performance was decreased due to lots of database flows. JDBC Specification provides the batch capability for insert, update and delete statements through the addBatch() and executeBatch() APIs. We should be able to take advantage of this capability to support SQL statement batching in OpenJPA.

      According to the old version of the OpenJPA manual (i.e., Kodo), statement batching was part of the initial functions. Conscious decision by BEA that this function was not contributed back to OpenJPA. We can still use this info as the implementation base with some modifications.

      I have completed the work for this statement batching support and the patch has been tested by CTS against Derby and DB2, OPENJPA regression test as well as our internal FVT test bucket. The following section describes the design and implementation info. I also attached the whole design documentation and the patch in this jira. Once the design and implementation are accepted, then I will update the OPENJPA manual to include this function. Thanks,

      Design and implementation:
      • Configuration:
      o Batch Limit value:
       0 - Disable batch support.
       -1 - Unlimited number of statements for a batch.
       Any positive number - Maximum number of statements for a batch.
      o By default, the batch support is based on each Dictionary to define the default batch limit. Currently only DB2 and Oracle dictionaries are set the default batch limit to 100. The default batch limit for rest of the dictionaries is set to zero (disabled).
      o To enable the batch support, user can specify the following property in the persistence.xml file:
      <property name="openjpa.jdbc.DBDictionary" value="BatchLimit=25"/>
      or
      <property name="openjpa.jdbc.DBDictionary"
      value="db2(batchLimit=25)"/>

      • Basic design is to cache all the insert/update/delete statements during the execution of the PreparedStatementManagerImpl.flushInternal() method. There is a cache structure which uses the LinkHashMap to maintain the order of the SQL statements for execution:
      o _cacheSql - a LinkHashMap to store the rows that associate with one PrepareStatement. Key: SQL statement string; Value: array list of rows.

      During the PreparedStatementManagerImpl.flush() process, it will go through the cache to prepare the SQL statement; add the statement to the batch; and execute the batch when the batch limit is reached or all the rows are processed for that statement. Validate the update count after the executeBatch() method.

      • If the batch limit =0 (disabled), execute the statement as the normal process; no need to use the batching process. Same rule applies to the statement that only has one row, execute it as the normal process.
      • The batch process will be disabled if the primary key generation is used the Identity strategy. When the GeneratedType=IDENTITY, we need to get the ID value right away for the in-memory entity to use. Therefore, we can't batch this kind of statement.
      • Batch exception process: a checkUpdateCount() is used to validate the batch process after the executeBatch(). According to the javadoc, there are three cases to consider:
      o Case of EXECUTE_FAILED: (-3):
       This is a failure case. If the action is UPDATE or there is FailedObject, treats it as OptimisticException. Otherwise, throws the SQLException.
       This is the same process as current implementation.
      o Case of SUCCESS_NO_INFO: (-2):
       We treat this as successful case and log the info in the log.
      o Case of 0:
       If there is a FailedObject or the action is INSERT, then throws the SQLException. Otherwise, treats it as successful case.

      1. statement batch design_1211.doc
        42 kB
        Teresa Kan
      2. OPENJPA-464.patch
        26 kB
        Teresa Kan
      3. OPENJPA-464.patch2
        31 kB
        Teresa Kan
      4. OPENJPA-464.patch3
        5 kB
        Teresa Kan

        Activity

        Hide
        Teresa Kan added a comment -

        Attach the design document and the patch. Basically changes are in :PreparedStatmentManagerImpl, DBDictionary, DB2Dictionary and OracleDictionary.
        Currenlty only DB2 and Oracle will take advantage of this support. If other databases which want to have this support, you can update the Dictionary to set batchLimit to some number. Follow the changes in the DB2 and Oracle Dictionaries.

        Show
        Teresa Kan added a comment - Attach the design document and the patch. Basically changes are in :PreparedStatmentManagerImpl, DBDictionary, DB2Dictionary and OracleDictionary. Currenlty only DB2 and Oracle will take advantage of this support. If other databases which want to have this support, you can update the Dictionary to set batchLimit to some number. Follow the changes in the DB2 and Oracle Dictionaries.
        Hide
        Teresa Kan added a comment -

        Attach the patch including the OPENJPA documentation changes.

        Show
        Teresa Kan added a comment - Attach the patch including the OPENJPA documentation changes.
        Hide
        Michael Dick added a comment -

        Hi Teresa,

        A couple quick comments on the patch:

        1. The implementations of getDefaultBatchLimit() for DB2Dictionary and OracleDictionary are slightly different. Further it looks like the DB2 dictionary doesn't allow the batchLimit to be set to UNLIMITED. Is that a known issue with DB2? If so we should add that to the documentation and possibly add a message indicating that we're overriding a user setting.

        On the other hand if this is not intended behavior then the dictionaries should be able to share a common method in DBDictionary. Ideally the subclass dictionaries would only need to set the default batch limit to some value, rather than re-implementing a method.

        2. Minor whitespace issue :
        + }

        +

        +

        + private void processSql(String sql, RowImpl row) throws SQLException

        { + ArrayList temprow; 3. Please add a comment to the empty catch block in the execute method : + }

        catch (SQLException se)

        { + }

        That's it for now. The only critical issue is the first one.

        Show
        Michael Dick added a comment - Hi Teresa, A couple quick comments on the patch: 1. The implementations of getDefaultBatchLimit() for DB2Dictionary and OracleDictionary are slightly different. Further it looks like the DB2 dictionary doesn't allow the batchLimit to be set to UNLIMITED. Is that a known issue with DB2? If so we should add that to the documentation and possibly add a message indicating that we're overriding a user setting. On the other hand if this is not intended behavior then the dictionaries should be able to share a common method in DBDictionary. Ideally the subclass dictionaries would only need to set the default batch limit to some value, rather than re-implementing a method. 2. Minor whitespace issue : + } + + + private void processSql(String sql, RowImpl row) throws SQLException { + ArrayList temprow; 3. Please add a comment to the empty catch block in the execute method : + } catch (SQLException se) { + } That's it for now. The only critical issue is the first one.
        Hide
        Teresa Kan added a comment -

        I redo the fix based on Pinaki's suggestion to use the UpdateManager plugin. Instead of extending from the AbstractUpdateManager, the BatchConstraintUpdateManager extends from the ConstraintUpdateManager:

        public class BatchingConstraintUpdateManager
        extends ConstraintUpdateManager {

        I also made this update manager as the default batch update manager. User can still plugin their own Batch mechanism through the property of UpdateManager:

        <property name="openjpa.jdbc.UpdateManager"
        value="org.apache.openjpa.jdbc.kernel.YourOperationOrderUpdateManager" />

        Show
        Teresa Kan added a comment - I redo the fix based on Pinaki's suggestion to use the UpdateManager plugin. Instead of extending from the AbstractUpdateManager, the BatchConstraintUpdateManager extends from the ConstraintUpdateManager: public class BatchingConstraintUpdateManager extends ConstraintUpdateManager { I also made this update manager as the default batch update manager. User can still plugin their own Batch mechanism through the property of UpdateManager: <property name="openjpa.jdbc.UpdateManager" value="org.apache.openjpa.jdbc.kernel.YourOperationOrderUpdateManager" />
        Hide
        Pinaki Poddar added a comment -

        Hello Teresa,
        Thank you for considering to separate batching logic in a separate UpdateManager.

        Comments on the patch
        1. Two separate methods on DBDictionary
        validateBatchProcess(...);
        validateDBSpecificBatchProcess(...)
        is not necessary.
        A single method validateBatchProcess(...) is sufficient.
        If specific database wants to augment or change the validation logic they should overwrite this method calling super.validateBatchProcess(...) if necessary.

        2. A single 'batchLimit' should be able to account for different sitauations. Instead I see two separate varaibles and some logic that does not appear necessary. But I have not comb through it. May be you should elaborate what is the intention of BATCH_LIMIT_NOT_SET and defaultBatchLimit.

        3. As you are changing default UpdateManager, it may be worthwhile to mention in the document that alias 'constraint' will activate ConstraintUpdateManager

        Show
        Pinaki Poddar added a comment - Hello Teresa, Thank you for considering to separate batching logic in a separate UpdateManager. Comments on the patch 1. Two separate methods on DBDictionary validateBatchProcess(...); validateDBSpecificBatchProcess(...) is not necessary. A single method validateBatchProcess(...) is sufficient. If specific database wants to augment or change the validation logic they should overwrite this method calling super.validateBatchProcess(...) if necessary. 2. A single 'batchLimit' should be able to account for different sitauations. Instead I see two separate varaibles and some logic that does not appear necessary. But I have not comb through it. May be you should elaborate what is the intention of BATCH_LIMIT_NOT_SET and defaultBatchLimit. 3. As you are changing default UpdateManager, it may be worthwhile to mention in the document that alias 'constraint' will activate ConstraintUpdateManager
        Hide
        Pinaki Poddar added a comment -

        I read your original posting again to see your usage of batch limit.
        Why not have a single batch limit bean-style property defined in DBDictionary?
        Also in
        DBDictionary.validateBatchProcess()

        { if (getBatchLimit()<-1) return false; // other auto-assign logic }

        1. You will not require to define batch limit for individual database dictionaries
        2. the concept of batch_limit_unset is not required. Just interpret any number less that -1 as an indication of switching off batching. Interpret -1 as maximum.

        Show
        Pinaki Poddar added a comment - I read your original posting again to see your usage of batch limit. Why not have a single batch limit bean-style property defined in DBDictionary? Also in DBDictionary.validateBatchProcess() { if (getBatchLimit()<-1) return false; // other auto-assign logic } 1. You will not require to define batch limit for individual database dictionaries 2. the concept of batch_limit_unset is not required. Just interpret any number less that -1 as an indication of switching off batching. Interpret -1 as maximum.
        Hide
        Teresa Kan added a comment -

        Append the patch2 based on Pinaki's suggestion:
        1) restructure the BatchingConstraintUpdateManager to extend ConstraintUpdateManager.
        2) Default will be 0,but can be overriden by each DB Dictionary''s constructor. Currently only DB2Dictionary and OracleDictionary constructors have set the default limit to 100. The rest of the Dictionaries will be zero.

        Thanks,
        Teresa

        Show
        Teresa Kan added a comment - Append the patch2 based on Pinaki's suggestion: 1) restructure the BatchingConstraintUpdateManager to extend ConstraintUpdateManager. 2) Default will be 0,but can be overriden by each DB Dictionary''s constructor. Currently only DB2Dictionary and OracleDictionary constructors have set the default limit to 100. The rest of the Dictionaries will be zero. Thanks, Teresa
        Hide
        Teresa Kan added a comment -

        Provided the batching capability for the OperationOrderUpdateManager. Attach the patch for this support.

        Show
        Teresa Kan added a comment - Provided the batching capability for the OperationOrderUpdateManager. Attach the patch for this support.
        Hide
        Yves Galante added a comment -

        On your document you write :
        "Batch exception process: a checkUpdateCount() is used to validate the batch process after the executeBatch()."
        And after :
        "Case of SUCCESS_NO_INFO: (-2): We treat this as successful case and log the info in the log"

        Oracle drivers return -2 (

        We have 3 choices :
        1)
        Use bach update with Oracle only for Insert SQL command (if return -2 we can't having more than one insert all is rigth).
        We need an others configuration parameter, one for update/delete and one for active insert batch mode.
        2)
        Use "Oralce batching mode", with special oracle class on the drivers ojdbc5.jar, this batching class return the correct number of rows updated. But we need oracle5.jar to building your projet (
        3)
        Execute an select for count the number of rows updated or deleted, with :
        For update :
        select count where id = XX and version = XX .
        For delete
        select count where id = XX

        Show
        Yves Galante added a comment - On your document you write : "Batch exception process: a checkUpdateCount() is used to validate the batch process after the executeBatch()." And after : "Case of SUCCESS_NO_INFO: (-2): We treat this as successful case and log the info in the log" Oracle drivers return -2 ( We have 3 choices : 1) Use bach update with Oracle only for Insert SQL command (if return -2 we can't having more than one insert all is rigth). We need an others configuration parameter, one for update/delete and one for active insert batch mode. 2) Use "Oralce batching mode", with special oracle class on the drivers ojdbc5.jar, this batching class return the correct number of rows updated. But we need oracle5.jar to building your projet ( 3) Execute an select for count the number of rows updated or deleted, with : For update : select count where id = XX and version = XX . For delete select count where id = XX

          People

          • Assignee:
            Michael Dick
            Reporter:
            Teresa Kan
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development