Derby
  1. Derby
  2. DERBY-1876

Investigate overhead of JDBC layer and compiled activation code for simple embedded read-only, forward ResultSets

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 10.6.1.0
    • Fix Version/s: 10.6.1.0
    • Component/s: JDBC
    • Labels:
      None
    • Bug behavior facts:
      Performance

      Description

      For simple ResultSet usage like:
      ResultSet rs = ps.executeQuery();
      while (rs.next())

      { rs.getInt(1); rs.getInt(2); rs.getInt(3); }

      rs.close();

      it would be interesting to see how much overhead could be removed with simple changes, or possibly removed if there was a simple ResultSet implementation for forward only, read-only ResultSet, and the more complete implementation for all other ResultSet types such as updateable and/or scrollable. Has introducing updateable ResultSets, for example, degraded the performance of read-only ResultSets? Could code be changed so that a typical read-only Resultset is not affected by the code required for richer ResultSets?

      1. derby1876.java
        1 kB
        Daniel John Debrunner
      2. derby1862.java
        2 kB
        Daniel John Debrunner
      3. timeout_colcount.diff
        3 kB
        Knut Anders Hatlen
      4. ers_current_row.txt
        5 kB
        Daniel John Debrunner

        Issue Links

          Activity

          Hide
          Rick Hillegas added a comment -

          It seems that a fair amount of work has been done in this area. Closing until someone wants to re-open this issue and do more work.

          Show
          Rick Hillegas added a comment - It seems that a fair amount of work has been done in this area. Closing until someone wants to re-open this issue and do more work.
          Hide
          Daniel John Debrunner added a comment -

          patch ers_current_row.txt committed - svn revision 566311

          Show
          Daniel John Debrunner added a comment - patch ers_current_row.txt committed - svn revision 566311
          Hide
          Daniel John Debrunner added a comment -

          From an earlier comment:

          ------------------------------------------------------------------------------------------------
          4) this.currentRow = factory.getValueRow(columnCount);
          currentRow.setRowArray(null);

          The call to getValueRow() creates a ValueRow with a new DataValueDescriptor array within it, the next line nulls out the reference to the array. Thus this creates a short lived object for no value.
          ------------------------------------------------------------------------------------------------

          ers_current_row.txt is a patch that changes how currentRow is used in EmbedResultSet. Now it is null to indicate the ResultSet is not positioned on a row, or not null and it points to the currrent row. The current row in the resultset is directly pointed to, which is what previously occurred, though at the DVD[] level and not the Row level.

          Running tests on the patch.

          Show
          Daniel John Debrunner added a comment - From an earlier comment: ------------------------------------------------------------------------------------------------ 4) this.currentRow = factory.getValueRow(columnCount); currentRow.setRowArray(null); The call to getValueRow() creates a ValueRow with a new DataValueDescriptor array within it, the next line nulls out the reference to the array. Thus this creates a short lived object for no value. ------------------------------------------------------------------------------------------------ ers_current_row.txt is a patch that changes how currentRow is used in EmbedResultSet. Now it is null to indicate the ResultSet is not positioned on a row, or not null and it points to the currrent row. The current row in the resultset is directly pointed to, which is what previously occurred, though at the DVD[] level and not the Row level. Running tests on the patch.
          Hide
          Knut Anders Hatlen added a comment -

          I have opened a separate JIRA issue (DERBY-3004) for the finalizer in EmbedResultSet.

          Show
          Knut Anders Hatlen added a comment - I have opened a separate JIRA issue ( DERBY-3004 ) for the finalizer in EmbedResultSet.
          Hide
          Daniel John Debrunner added a comment -

          I wonder if it's possible to move the finalizer logic to EmbedStatement. Since a ResultSet will always be referenced by its Statement then we only need to close the ResultSet when it's statement is finalized. Then EmbedResultSet would not need a finalizer method.

          Show
          Daniel John Debrunner added a comment - I wonder if it's possible to move the finalizer logic to EmbedStatement. Since a ResultSet will always be referenced by its Statement then we only need to close the ResultSet when it's statement is finalized. Then EmbedResultSet would not need a finalizer method.
          Hide
          Knut Anders Hatlen added a comment -

          By commenting out the finalize() method in EmbedResultSet, the time to run the test on my laptop is reduced from ~3 seconds to ~1.3 seconds (OpenSolaris, Sun Java SE 6). Since the finalizer is a no-op if the activation isn't single-use, I think we should investigate whether there are ways to avoid this cost for ResultSets that don't need the finalizer.

          Show
          Knut Anders Hatlen added a comment - By commenting out the finalize() method in EmbedResultSet, the time to run the test on my laptop is reduced from ~3 seconds to ~1.3 seconds (OpenSolaris, Sun Java SE 6). Since the finalizer is a no-op if the activation isn't single-use, I think we should investigate whether there are ways to avoid this cost for ResultSets that don't need the finalizer.
          Hide
          Knut Anders Hatlen added a comment -

          Committed timeout_colcount.diff with revision 522445.

          Show
          Knut Anders Hatlen added a comment - Committed timeout_colcount.diff with revision 522445.
          Hide
          Knut Anders Hatlen added a comment -

          Forgot to mention that all tests passed.

          Show
          Knut Anders Hatlen added a comment - Forgot to mention that all tests passed.
          Hide
          Knut Anders Hatlen added a comment -

          Attaching a simple patch to fix issue #1 and #3 mentioned above. I have not been able to measure any performance improvement, but I think it's a good change anyway. At least, it eliminates one call to EmbedResultSet.checkIfClosed() and one call to EmbedStatement.checkStatus() per creation of an EmbedResultSet instance.

          Show
          Knut Anders Hatlen added a comment - Attaching a simple patch to fix issue #1 and #3 mentioned above. I have not been able to measure any performance improvement, but I think it's a good change anyway. At least, it eliminates one call to EmbedResultSet.checkIfClosed() and one call to EmbedStatement.checkStatus() per creation of an EmbedResultSet instance.
          Hide
          Andrew McIntyre added a comment -

          The "More Actions:" drop-down box has a rename page function. Renamed the page to http://wiki.apache.org/db-derby/DerbyBug1876

          Show
          Andrew McIntyre added a comment - The "More Actions:" drop-down box has a rename page function. Renamed the page to http://wiki.apache.org/db-derby/DerbyBug1876
          Hide
          Daniel John Debrunner added a comment -

          Link to wiki page capturing the performance progress due to changes made related to this issue.
          http://wiki.apache.org/db-derby/DerbyBug1872
          (yes I got the bug number wrong in the wiki page )

          Show
          Daniel John Debrunner added a comment - Link to wiki page capturing the performance progress due to changes made related to this issue. http://wiki.apache.org/db-derby/DerbyBug1872 (yes I got the bug number wrong in the wiki page )
          Hide
          Knut Anders Hatlen added a comment -

          To get an impression of the allocation/gc cost for the statistics, I made all the runtime statistics fields in BasicNoPutResultSetImpl, RowResultSet and UnionResultSet static (these are the ResultSet classes used by the query in derby1867.java). I ran derby1876.java ten times with and ten times without the changes. On average, the test used 15% less time with static fields on Solaris, JVM 1.6. On FreeBSD, JVM 1.5, the time spent was 20% lower with static fields.

          Show
          Knut Anders Hatlen added a comment - To get an impression of the allocation/gc cost for the statistics, I made all the runtime statistics fields in BasicNoPutResultSetImpl, RowResultSet and UnionResultSet static (these are the ResultSet classes used by the query in derby1867.java). I ran derby1876.java ten times with and ten times without the changes. On average, the test used 15% less time with static fields on Solaris, JVM 1.6. On FreeBSD, JVM 1.5, the time spent was 20% lower with static fields.
          Hide
          Daniel John Debrunner added a comment -

          Modified version of derby1876.java that uses the same basic SQL statement but with column names and then fetches the data using ResultSet.getInt(String). Used to show the performance improvement by changes made in DERBY-1862.
          Also shows that even with the changes for DERBY-1862, fetching by column name operates about 60% the performance of fetching by index, with this simple select case.

          Show
          Daniel John Debrunner added a comment - Modified version of derby1876.java that uses the same basic SQL statement but with column names and then fetches the data using ResultSet.getInt(String). Used to show the performance improvement by changes made in DERBY-1862 . Also shows that even with the changes for DERBY-1862 , fetching by column name operates about 60% the performance of fetching by index, with this simple select case.
          Hide
          Daniel John Debrunner added a comment -

          I've thought in the past about having the common statistic information in a language ResultSet implementation that could wrap the real langugae ResultSet. Then the ResultSetFactory would return a instance of the real ResultSet wrapped with a StatisticsResultSet when timing etc. was enabled. The StatisticsResultSet would have code like:

          class StatisticsResultSet {
          private long openTime;

          private final ResultSet rs;

          pubic void openCore()

          { long s = System.currentTimeMillis(); rs.openCore(); long e = System.currentTimeMillis(); openTime += (e -s); }

          This then removes the common fields and code to get the times from all the ResultSet classes, also leading to consistency.

          The issue with this that code that performs instanceof checks to get specific behaviour of a ResultSet no longer works, though there are solutions to that:
          A) Don't do it, it breaks the api in that the ResultSetFactory says it is returning a ResultSet, so that's all the callers should depend on.
          B) Follow the JBDC wrapper model, have a getRealResultSet() call in ResultSet api which would return this for most implementations, but the wrapped rs for StatisticsResultSet. Then code that needed to access the specific class would use the return from that method.

          Show
          Daniel John Debrunner added a comment - I've thought in the past about having the common statistic information in a language ResultSet implementation that could wrap the real langugae ResultSet. Then the ResultSetFactory would return a instance of the real ResultSet wrapped with a StatisticsResultSet when timing etc. was enabled. The StatisticsResultSet would have code like: class StatisticsResultSet { private long openTime; private final ResultSet rs; pubic void openCore() { long s = System.currentTimeMillis(); rs.openCore(); long e = System.currentTimeMillis(); openTime += (e -s); } This then removes the common fields and code to get the times from all the ResultSet classes, also leading to consistency. The issue with this that code that performs instanceof checks to get specific behaviour of a ResultSet no longer works, though there are solutions to that: A) Don't do it, it breaks the api in that the ResultSetFactory says it is returning a ResultSet, so that's all the callers should depend on. B) Follow the JBDC wrapper model, have a getRealResultSet() call in ResultSet api which would return this for most implementations, but the wrapped rs for StatisticsResultSet. Then code that needed to access the specific class would use the return from that method.
          Hide
          Knut Anders Hatlen added a comment -

          Moving one level down in the code and looking at the internal ResultSet hierarchy (not java.sql.ResultSet) could also be useful. I have noticed that garbage collection takes a lot of CPU time when running simple select statements in Derby. Using DTrace to collect information about object allocation, I observed that most of the garbage were objects of type TableScanResultSet. Although many classes were instanciated more frequently, they were relatively small and didn't contribute so much to the total number of bytes allocated. However, TableScanResultSet has almost ninety fields, which probably means each object is at least 400 bytes, even when all fields are null. Refactoring to reduce bare-object size would be good.

          For instance, BasicNoPutResultSetImpl (one of TSRS's super classes) has a number of fields to collect timing statistics: 1 boolean, 7 longs and 3 ints. That should be about 70 bytes per object. To reduce garbage produced when timing is off, timing information could be factored out into a statistics class. Then we only need one field in the result set class, and it could be null when timing is off.

          Show
          Knut Anders Hatlen added a comment - Moving one level down in the code and looking at the internal ResultSet hierarchy (not java.sql.ResultSet) could also be useful. I have noticed that garbage collection takes a lot of CPU time when running simple select statements in Derby. Using DTrace to collect information about object allocation, I observed that most of the garbage were objects of type TableScanResultSet. Although many classes were instanciated more frequently, they were relatively small and didn't contribute so much to the total number of bytes allocated. However, TableScanResultSet has almost ninety fields, which probably means each object is at least 400 bytes, even when all fields are null. Refactoring to reduce bare-object size would be good. For instance, BasicNoPutResultSetImpl (one of TSRS's super classes) has a number of fields to collect timing statistics: 1 boolean, 7 longs and 3 ints. That should be about 70 bytes per object. To reduce garbage produced when timing is off, timing information could be factored out into a statistics class. Then we only need one field in the result set class, and it could be null when timing is off.
          Hide
          Daniel John Debrunner added a comment -

          DERBY-827 reduces overhead in the SQL portion of simple queries by re-using the ResultSet true tha the activation and JDBC ResultSet use.

          Show
          Daniel John Debrunner added a comment - DERBY-827 reduces overhead in the SQL portion of simple queries by re-using the ResultSet true tha the activation and JDBC ResultSet use.
          Hide
          Daniel John Debrunner added a comment -

          JDBC requires that each execution of ps.executeQuery() returns a new ResultSet object, thus looking at the code within EmbedResultSet() constructor is worth while.
          There are at least four sub-optimal pieces of code:

          1) (long)stmt.getQueryTimeout() * 1000L;
          Each creation a multiplication by 1000 is executed, could be avoided by having the
          value in ms calculated once and stored in EmbedStatement when setQueryTimeout is called.

          2) stmt.getResultSetConcurrency()
          This call checks the Statement object to see if it is open, but it must be open since it is creating a ResultSet. Can be avoided by having EmbedResultSet use the final field that holds the concurrency, if iw was made package protected.

          3) final int columnCount = getMetaData().getColumnCount();
          This creates a ResultSetMetaData object that will not be used in the typical case (at least I think it will not be used). Using resultDescription.getColumnCount() would get the count with no object creation.

          4) this.currentRow = factory.getValueRow(columnCount);
          currentRow.setRowArray(null);

          The call to getValueRow() creates a ValueRow with a new DataValueDescriptor array within it, the next line nulls out the refrence to the array. Thus this creates a short lived object for no value. Possibly work around is to have currentRow use the array directly, or create a ValueRow with no array within it.

          Show
          Daniel John Debrunner added a comment - JDBC requires that each execution of ps.executeQuery() returns a new ResultSet object, thus looking at the code within EmbedResultSet() constructor is worth while. There are at least four sub-optimal pieces of code: 1) (long)stmt.getQueryTimeout() * 1000L; Each creation a multiplication by 1000 is executed, could be avoided by having the value in ms calculated once and stored in EmbedStatement when setQueryTimeout is called. 2) stmt.getResultSetConcurrency() This call checks the Statement object to see if it is open, but it must be open since it is creating a ResultSet. Can be avoided by having EmbedResultSet use the final field that holds the concurrency, if iw was made package protected. 3) final int columnCount = getMetaData().getColumnCount(); This creates a ResultSetMetaData object that will not be used in the typical case (at least I think it will not be used). Using resultDescription.getColumnCount() would get the count with no object creation. 4) this.currentRow = factory.getValueRow(columnCount); currentRow.setRowArray(null); The call to getValueRow() creates a ValueRow with a new DataValueDescriptor array within it, the next line nulls out the refrence to the array. Thus this creates a short lived object for no value. Possibly work around is to have currentRow use the array directly, or create a ValueRow with no array within it.
          Hide
          Daniel John Debrunner added a comment -

          Simple java program, derby1876, that runs a VALUES statement a number of times and measures its performance. The SQL query is simple so that the store and most of the SQL language layer is factored out of measurement, to allow focus on the base infrastructure.

          Can be used as a test-bed for performance investigations for this issue.

          Show
          Daniel John Debrunner added a comment - Simple java program, derby1876, that runs a VALUES statement a number of times and measures its performance. The SQL query is simple so that the store and most of the SQL language layer is factored out of measurement, to allow focus on the base infrastructure. Can be used as a test-bed for performance investigations for this issue.

            People

            • Assignee:
              Unassigned
              Reporter:
              Daniel John Debrunner
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development