Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 10.1.2.1, 10.1.3.1
    • Fix Version/s: 10.2.1.6, 10.2.2.0, 10.3.1.4
    • Component/s: None
    • Labels:
      None
    • Environment:
      WinXp, JRE 1.5_6., Hibernate 3.1
    • Urgency:
      Normal
    • Bug behavior facts:
      Performance

      Description

      We are currently developing a system where we load between 1000 and 5000 objects in one go. The user can load different chunks of objects at any time as he/she is navigating.
      The system consist of a java application which accesses derby via hibernate.
      During profiling we discovered that the org.apache.derby.iapi.util.StringUtil is the biggest bottleneck in the system.
      The method SQLEqualsIgnoreCase(String s1, String s2) is doing upperCase on both s1 and s2, all the time.
      By putting the uppcase value into a Hashtable and using the input-string as key we increates the performance with about 40%.
      Our test-users report that the system now seems to run at "double speed".

      The class calling the StringUtil.SQLEqualsIgnoreCase in this case is

      org.apache.derby.impl.jdbc.EmbedResultSet

      This class should also be checked as it seems to do a lot of looping.
      It might be a canditate for hashing, as it is stated in the code:
      "// REVISIT: we might want to cache our own info..."

      Here is a diff agains the 10.1.3.1 source for org.apache.derby.iapi.util.StringUtil

      22a23
      > import java.util.Hashtable;
      319c320,326
      < return s1.toUpperCase(Locale.ENGLISH).equals(s2.toUpperCase(Locale.ENGLISH));

      > {
      > String s1Up = (String) uppercaseMap.get(s1);
      > if (s1Up == null)
      >

      { > s1Up = s1.toUpperCase(Locale.ENGLISH); > uppercaseMap.put(s1,s1Up); > }

      320a328,332
      > String s2Up = (String) uppercaseMap.get(s2);
      > if (s2Up == null)
      >

      { > s2Up = s2.toUpperCase(Locale.ENGLISH); > uppercaseMap.put(s2,s2Up); 321a334 > return s1Up.equals(s2Up); 322a336,339 > //return s1.toUpperCase(Locale.ENGLISH).equals(s2.toUpperCase(Locale.ENGLISH)); > }

      > }
      > private static Hashtable uppercaseMap = new Hashtable();

      1. DERBY-1862v3.diff
        3 kB
        Andreas Korneliussen
      2. DERBY-1862v2.diff
        3 kB
        Andreas Korneliussen
      3. DERBY-1862.diff
        1 kB
        Andreas Korneliussen
      4. DERBY-1696v2.diff
        11 kB
        Andreas Korneliussen

        Activity

        Hide
        Myrna van Lunteren added a comment -

        Closing issue altogether after adjusting fix version

        Show
        Myrna van Lunteren added a comment - Closing issue altogether after adjusting fix version
        Hide
        Daniel John Debrunner added a comment -

        With a modified version of the simple performance test case in DERBY-1876 (ie. using column names) I saw a 35% performance improvement due to this fix. I will attached the modified test case to DERBY-1876 as it's interesting for that investigation as well.

        Thanks very much for Tore Andre Olmheim for raising the issue and providing the insight into where the problem lay, and to Andreas Korneliussen for providing the patch.

        Show
        Daniel John Debrunner added a comment - With a modified version of the simple performance test case in DERBY-1876 (ie. using column names) I saw a 35% performance improvement due to this fix. I will attached the modified test case to DERBY-1876 as it's interesting for that investigation as well. Thanks very much for Tore Andre Olmheim for raising the issue and providing the insight into where the problem lay, and to Andreas Korneliussen for providing the patch.
        Hide
        Daniel John Debrunner added a comment -

        Patch DERBY-1862v3.dif committed revision 448949. Thanks Andreas for making the performance improvements.

        Show
        Daniel John Debrunner added a comment - Patch DERBY-1862 v3.dif committed revision 448949. Thanks Andreas for making the performance improvements.
        Hide
        Andreas Korneliussen added a comment -

        Attaching a modified patch where I have taken in the advice of not creating the map object in the constructor, and using ReuseFactory for getting Integer objects. Synchronization is done on "this" to protect the map from concurrent access while creating/populating it.

        Show
        Andreas Korneliussen added a comment - Attaching a modified patch where I have taken in the advice of not creating the map object in the constructor, and using ReuseFactory for getting Integer objects. Synchronization is done on "this" to protect the map from concurrent access while creating/populating it.
        Hide
        Daniel John Debrunner added a comment -

        The patch looks fine and I assume works correctly, but there is room for improvement, related to performance:

        • The map is always created, even if it is never used. This will increase memory usage and slow down existing applications a little.
        • The map is created and filled in for an individual ResultSet, but in fact it's a property of the compiled plan, so ideally
          there could be one copy of this map per compiled plan. So with a multi-user application running the same statements this
          patch will consume more memory as a factor of the number of users. Now there is probably other meta-data aspects of
          an EmbedResultSet that could be shared at a plan level, so this could be seen as future cleanup.
        • Rather than using new Integer() as the values for the hash map, the code could use ReuseFactory.getInteger() to
          reduce memory usage.

        I'm fine with the patch being committed, but would like to ensure these optimizations are not lost.

        Show
        Daniel John Debrunner added a comment - The patch looks fine and I assume works correctly, but there is room for improvement, related to performance: The map is always created, even if it is never used. This will increase memory usage and slow down existing applications a little. The map is created and filled in for an individual ResultSet, but in fact it's a property of the compiled plan, so ideally there could be one copy of this map per compiled plan. So with a multi-user application running the same statements this patch will consume more memory as a factor of the number of users. Now there is probably other meta-data aspects of an EmbedResultSet that could be shared at a plan level, so this could be seen as future cleanup. Rather than using new Integer() as the values for the hash map, the code could use ReuseFactory.getInteger() to reduce memory usage. I'm fine with the patch being committed, but would like to ensure these optimizations are not lost.
        Hide
        Andreas Korneliussen added a comment -

        Attached incorrect patch (DERBY-1696v2.diff). The correct patch is DERBY-1862v2.diff.

        Show
        Andreas Korneliussen added a comment - Attached incorrect patch ( DERBY-1696 v2.diff). The correct patch is DERBY-1862 v2.diff.
        Hide
        Andreas Korneliussen added a comment -

        The attached patch makes a map of column names to column number. The map is populated when the first call to findColumn is made.

        Show
        Andreas Korneliussen added a comment - The attached patch makes a map of column names to column number. The map is populated when the first call to findColumn is made.
        Hide
        Tore Andre Olmheim added a comment -

        Yes I agree with Andreas that my patch will leak memory.
        In our system this method is mainly called from
        findColumnName(String columnName) in org.apache.derby.impl.jdbc.EmbedResultSet.

        This method is only comparing column-names, and there is a limited number of column-names in a database
        so the memory leak will not be a big issue.

        But, of course if the SQLEqualsIgnoreCase is used by other classes, then Andreas has a good point.
        I still think my Hashtable version will be the fastest.

        The best solution would be to refactor findColumnName(String columnName) in org.apache.derby.impl.jdbc.EmbedResultSet.

        I will leave it to the persons, who knows the derby design well, to decide what approach to take.

        Show
        Tore Andre Olmheim added a comment - Yes I agree with Andreas that my patch will leak memory. In our system this method is mainly called from findColumnName(String columnName) in org.apache.derby.impl.jdbc.EmbedResultSet. This method is only comparing column-names, and there is a limited number of column-names in a database so the memory leak will not be a big issue. But, of course if the SQLEqualsIgnoreCase is used by other classes, then Andreas has a good point. I still think my Hashtable version will be the fastest. The best solution would be to refactor findColumnName(String columnName) in org.apache.derby.impl.jdbc.EmbedResultSet. I will leave it to the persons, who knows the derby design well, to decide what approach to take.
        Hide
        Andreas Korneliussen added a comment -

        Attached is a patch which uses another approach to improve the SQLEqualsIgnoreCase method. The patch check the identity and length of the strings to be compared, before doing conversions to uppercase with english locale.

        String.toUpperCase(..) with english locale, should return a string with the same number of characters, and it should therefore be valid to do a check of number of characters before doing any conversions.

        The patch which is posted as part of the description, will leak memory, since strings are never removed from the upperCaseMap.

        Show
        Andreas Korneliussen added a comment - Attached is a patch which uses another approach to improve the SQLEqualsIgnoreCase method. The patch check the identity and length of the strings to be compared, before doing conversions to uppercase with english locale. String.toUpperCase(..) with english locale, should return a string with the same number of characters, and it should therefore be valid to do a check of number of characters before doing any conversions. The patch which is posted as part of the description, will leak memory, since strings are never removed from the upperCaseMap.

          People

          • Assignee:
            Andreas Korneliussen
            Reporter:
            Tore Andre Olmheim
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development