Lucene - Core
  1. Lucene - Core
  2. LUCENE-5372

Replace StringBuffer with StringBuilder where possible, add to forbidden-apis

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.7, 6.0
    • Component/s: core/index
    • Labels:
      None
    • Lucene Fields:
      Patch Available

      Description

      This is pretty minor, but I found a few issues with the toString implementations while looking through the facet data structures.

      The most egregious is the use of string concatenation in the IntArray class. I have fixed that using StringBuilders. I also noticed that other classes were using StringBuffer instead of StringBuilder. According to the javadoc,

      "This class is designed for use as a drop-in replacement for StringBuffer in places where the string buffer was being used by a single thread (as is generally the case). Where possible, it is recommended that this class be used in preference to StringBuffer as it will be faster under most implementations."

      1. 5372.patch
        7 kB
        Joshua Hartman
      2. 5372-lucene5339.patch
        32 kB
        Joshua Hartman
      3. 5372-v2.patch
        39 kB
        Joshua Hartman
      4. LUCENE-5372-forbidden.patch
        0.8 kB
        Uwe Schindler
      5. LUCENE-5372-StringBuffer.patch
        33 kB
        Uwe Schindler

        Activity

        Hide
        Joshua Hartman added a comment -

        I was using git-svn, but I believe the patch should still apply with patch -p1

        Show
        Joshua Hartman added a comment - I was using git-svn, but I believe the patch should still apply with patch -p1
        Hide
        Dawid Weiss added a comment -

        Looks good to me and I think it's applicable to 4.x and 5.x (StringBuilder requires Java >= 1.5 but both of these branches do, right)?

        Show
        Dawid Weiss added a comment - Looks good to me and I think it's applicable to 4.x and 5.x (StringBuilder requires Java >= 1.5 but both of these branches do, right)?
        Hide
        Mark Miller added a comment -

        Yes, 4x=6, 5x=7

        Show
        Mark Miller added a comment - Yes, 4x=6, 5x=7
        Hide
        Joshua Hartman added a comment -

        I can also optimize memory usage by precalculating the maximum size in advance for the StringBuilder for each collection. May be overkill - what are your thoughts? I am new here.

        http://wiki.apache.org/lucene-java/HowToContribute implies I should wait for the patch to be pulled in by a lucene dev. Is this a correct interpretation?

        Show
        Joshua Hartman added a comment - I can also optimize memory usage by precalculating the maximum size in advance for the StringBuilder for each collection. May be overkill - what are your thoughts? I am new here. http://wiki.apache.org/lucene-java/HowToContribute implies I should wait for the patch to be pulled in by a lucene dev. Is this a correct interpretation?
        Hide
        Dawid Weiss added a comment -

        No need to be paranoid about performance here, Josh. The patch is fine, I'll apply it, although tomorrow because it's gotten really late over here.

        Show
        Dawid Weiss added a comment - No need to be paranoid about performance here, Josh. The patch is fine, I'll apply it, although tomorrow because it's gotten really late over here.
        Hide
        Michael McCandless added a comment -

        Hi Joshua, thank you for the patch here, but we are in the process of simplifying the facet APIs (on this branch: https://svn.apache.org/repos/asf/lucene/dev/branches/lucene5339 ), and I believe most (all?) of the affected code here has been removed. So I would hold off for now, or maybe check out the branch and see if any of these O(N^2) problems still remain.

        Thanks!

        Show
        Michael McCandless added a comment - Hi Joshua, thank you for the patch here, but we are in the process of simplifying the facet APIs (on this branch: https://svn.apache.org/repos/asf/lucene/dev/branches/lucene5339 ), and I believe most (all?) of the affected code here has been removed. So I would hold off for now, or maybe check out the branch and see if any of these O(N^2) problems still remain. Thanks!
        Hide
        Dawid Weiss added a comment -

        Do you want me to hold it, Mike? Josh – perhaps you could rebase your patch against the branch Mike pointed to (if there's anything left)?

        Show
        Dawid Weiss added a comment - Do you want me to hold it, Mike? Josh – perhaps you could rebase your patch against the branch Mike pointed to (if there's anything left)?
        Hide
        Uwe Schindler added a comment -

        Oh oh, we should put StringBuffer on the forbidden-apis list. It is one entry in our base.txt signatures file (please don't add a new one).

        We replaced all StringBuffers already earlier when we moved to Lucene 3.0 (which is the first one for Java 5), so we can disallow StringBuffer everywhere!

        Show
        Uwe Schindler added a comment - Oh oh, we should put StringBuffer on the forbidden-apis list. It is one entry in our base.txt signatures file (please don't add a new one). We replaced all StringBuffers already earlier when we moved to Lucene 3.0 (which is the first one for Java 5), so we can disallow StringBuffer everywhere!
        Hide
        Uwe Schindler added a comment - - edited

        This already fails in lucene-core's test! So we should replace them everywhere (although tests is not an issue).

        ant check-forbidden-apis
        
        [...]
        
        -check-forbidden-base:
        [forbidden-apis] Reading bundled API signatures: jdk-unsafe-1.7
        [forbidden-apis] Reading bundled API signatures: jdk-deprecated-1.7
        [forbidden-apis] Reading API signatures: C:\Users\Uwe Schindler\Projects\lucene\trunk-lusolr3\lucene\tools\forbiddenApis\base.txt
        [forbidden-apis] Loading classes to check...
        [forbidden-apis] Scanning for API signatures and dependencies...
        [forbidden-apis] Forbidden class/interface use: java.lang.StringBuffer [Use StringBuilder instead, which has no synchronization]
        [forbidden-apis]   in org.apache.lucene.TestSearchForDuplicates (TestSearchForDuplicates.java:56)
        [forbidden-apis] Forbidden class/interface use: java.lang.StringBuffer [Use StringBuilder instead, which has no synchronization]
        [forbidden-apis]   in org.apache.lucene.TestSearchForDuplicates (TestSearchForDuplicates.java:64)
        [forbidden-apis] Forbidden class/interface use: java.lang.StringBuffer [Use StringBuilder instead, which has no synchronization]
        [forbidden-apis]   in org.apache.lucene.analysis.tokenattributes.CharTermAttributeImpl (CharTermAttributeImpl.java:148)
        [forbidden-apis] Forbidden class/interface use: java.lang.StringBuffer [Use StringBuilder instead, which has no synchronization]
        [forbidden-apis]   in org.apache.lucene.TestSearch (TestSearch.java:95)
        [forbidden-apis] Forbidden class/interface use: java.lang.StringBuffer [Use StringBuilder instead, which has no synchronization]
        [forbidden-apis]   in org.apache.lucene.TestSearch (TestSearch.java:103)
        [forbidden-apis] Forbidden class/interface use: java.lang.StringBuffer [Use StringBuilder instead, which has no synchronization]
        [forbidden-apis]   in org.apache.lucene.index.TestDoc (TestDoc.java:152)
        [forbidden-apis] Forbidden class/interface use: java.lang.StringBuffer [Use StringBuilder instead, which has no synchronization]
        [forbidden-apis]   in org.apache.lucene.index.TestDoc (TestDoc.java:193)
        [forbidden-apis] Forbidden class/interface use: java.lang.StringBuffer [Use StringBuilder instead, which has no synchronization]
        [forbidden-apis]   in org.apache.lucene.analysis.tokenattributes.TestCharTermAttributeImpl (TestCharTermAttributeImpl.java:179)
        [forbidden-apis] Forbidden class/interface use: java.lang.StringBuffer [Use StringBuilder instead, which has no synchronization]
        [forbidden-apis]   in org.apache.lucene.analysis.tokenattributes.TestCharTermAttributeImpl (TestCharTermAttributeImpl.java:216)
        [forbidden-apis] Scanned 2369 (and 462 related) class file(s) for forbidden API invocations (in 1.19s), 9 error(s).
        
        Show
        Uwe Schindler added a comment - - edited This already fails in lucene-core's test! So we should replace them everywhere (although tests is not an issue). ant check-forbidden-apis [...] -check-forbidden-base: [forbidden-apis] Reading bundled API signatures: jdk-unsafe-1.7 [forbidden-apis] Reading bundled API signatures: jdk-deprecated-1.7 [forbidden-apis] Reading API signatures: C:\Users\Uwe Schindler\Projects\lucene\trunk-lusolr3\lucene\tools\forbiddenApis\base.txt [forbidden-apis] Loading classes to check... [forbidden-apis] Scanning for API signatures and dependencies... [forbidden-apis] Forbidden class/interface use: java.lang.StringBuffer [Use StringBuilder instead, which has no synchronization] [forbidden-apis] in org.apache.lucene.TestSearchForDuplicates (TestSearchForDuplicates.java:56) [forbidden-apis] Forbidden class/interface use: java.lang.StringBuffer [Use StringBuilder instead, which has no synchronization] [forbidden-apis] in org.apache.lucene.TestSearchForDuplicates (TestSearchForDuplicates.java:64) [forbidden-apis] Forbidden class/interface use: java.lang.StringBuffer [Use StringBuilder instead, which has no synchronization] [forbidden-apis] in org.apache.lucene.analysis.tokenattributes.CharTermAttributeImpl (CharTermAttributeImpl.java:148) [forbidden-apis] Forbidden class/interface use: java.lang.StringBuffer [Use StringBuilder instead, which has no synchronization] [forbidden-apis] in org.apache.lucene.TestSearch (TestSearch.java:95) [forbidden-apis] Forbidden class/interface use: java.lang.StringBuffer [Use StringBuilder instead, which has no synchronization] [forbidden-apis] in org.apache.lucene.TestSearch (TestSearch.java:103) [forbidden-apis] Forbidden class/interface use: java.lang.StringBuffer [Use StringBuilder instead, which has no synchronization] [forbidden-apis] in org.apache.lucene.index.TestDoc (TestDoc.java:152) [forbidden-apis] Forbidden class/interface use: java.lang.StringBuffer [Use StringBuilder instead, which has no synchronization] [forbidden-apis] in org.apache.lucene.index.TestDoc (TestDoc.java:193) [forbidden-apis] Forbidden class/interface use: java.lang.StringBuffer [Use StringBuilder instead, which has no synchronization] [forbidden-apis] in org.apache.lucene.analysis.tokenattributes.TestCharTermAttributeImpl (TestCharTermAttributeImpl.java:179) [forbidden-apis] Forbidden class/interface use: java.lang.StringBuffer [Use StringBuilder instead, which has no synchronization] [forbidden-apis] in org.apache.lucene.analysis.tokenattributes.TestCharTermAttributeImpl (TestCharTermAttributeImpl.java:216) [forbidden-apis] Scanned 2369 (and 462 related) class file(s) for forbidden API invocations (in 1.19s), 9 error(s).
        Hide
        Uwe Schindler added a comment -

        his one is wanted, but mostly obsolete - maybe put on exclusion list:

        [forbidden-apis] Forbidden class/interface use: java.lang.StringBuffer [Use StringBuilder instead, which has no synchronization]
        [forbidden-apis]   in org.apache.lucene.analysis.tokenattributes.CharTermAttributeImpl (CharTermAttributeImpl.java:148)
        
        Show
        Uwe Schindler added a comment - his one is wanted, but mostly obsolete - maybe put on exclusion list: [forbidden-apis] Forbidden class/interface use: java.lang.StringBuffer [Use StringBuilder instead, which has no synchronization] [forbidden-apis] in org.apache.lucene.analysis.tokenattributes.CharTermAttributeImpl (CharTermAttributeImpl.java:148)
        Hide
        Uwe Schindler added a comment -

        Should we put the remaining StringBuffers on a separate issue or fix it here?

        Show
        Uwe Schindler added a comment - Should we put the remaining StringBuffers on a separate issue or fix it here?
        Hide
        Dawid Weiss added a comment -

        I think we can fix it here since Josh brought that up.

        Show
        Dawid Weiss added a comment - I think we can fix it here since Josh brought that up.
        Hide
        Michael McCandless added a comment -

        Do you want me to hold it, Mike?

        Yes, please!

        +1 to fix StringBuffers here! Thanks, Joshua.

        Show
        Michael McCandless added a comment - Do you want me to hold it, Mike? Yes, please! +1 to fix StringBuffers here! Thanks, Joshua.
        Hide
        Joshua Hartman added a comment -

        I'll make the StringBuffer -> StringBuilder fix on trunk the branch mentioned by Mike and attach new patches for each. Expect the patch this evening or tomorrow.

        Show
        Joshua Hartman added a comment - I'll make the StringBuffer -> StringBuilder fix on trunk the branch mentioned by Mike and attach new patches for each. Expect the patch this evening or tomorrow.
        Hide
        Joshua Hartman added a comment -

        Uploaded versions of the patch for both trunk and lucene5339 branches for replacing StringBuffer -> StringBuilder. Due to API constraints it is not possible to do so in all cases.

        Mike was also correct. Code related to the specific O(n^2) issue this JIRA was started for no longer exists on lucene5339.

        Show
        Joshua Hartman added a comment - Uploaded versions of the patch for both trunk and lucene5339 branches for replacing StringBuffer -> StringBuilder. Due to API constraints it is not possible to do so in all cases. Mike was also correct. Code related to the specific O(n^2) issue this JIRA was started for no longer exists on lucene5339.
        Hide
        Dawid Weiss added a comment -

        I wanted to apply these patches but then looked deeper and it seems we can't just apply them without some consideration. For example, while reviewing, I noticed things such as this one:

        - * All use of StringBuffers has been refactored to StringBuilder for speed.
        + * All use of StringBuilders has been refactored to StringBuilder for speed.
        

        which seem to be an auto-replacement artifact. While this is a no-problem, this may be:

        +++ b/lucene/core/src/java/org/apache/lucene/analysis/tokenattributes/CharTermAttributeImpl.java
        @@ -144,8 +144,8 @@ public class CharTermAttributeImpl extends AttributeImpl implements CharTermAttr
               } else if (csq instanceof CharBuffer && ((CharBuffer) csq).hasArray()) {
                 final CharBuffer cb = (CharBuffer) csq;
                 System.arraycopy(cb.array(), cb.arrayOffset() + cb.position() + start, termBuffer, termLength, len);
        -      } else if (csq instanceof StringBuffer) {
        -        ((StringBuffer) csq).getChars(start, end, termBuffer, termLength);
        +      } else if (csq instanceof StringBuilder) {
        +        ((StringBuilder) csq).getChars(start, end, termBuffer, termLength);
        

        but CharTermAttributeImpl already has an if clause for StringBuilder, the full code is:

              if (csq instanceof String) {
                ((String) csq).getChars(start, end, termBuffer, termLength);
              } else if (csq instanceof StringBuilder) {
                ((StringBuilder) csq).getChars(start, end, termBuffer, termLength);
              } else if (csq instanceof CharTermAttribute) {
                System.arraycopy(((CharTermAttribute) csq).buffer(), start, termBuffer, termLength, len);
              } else if (csq instanceof CharBuffer && ((CharBuffer) csq).hasArray()) {
                final CharBuffer cb = (CharBuffer) csq;
                System.arraycopy(cb.array(), cb.arrayOffset() + cb.position() + start, termBuffer, termLength, len);
              } else if (csq instanceof StringBuffer) {
                ((StringBuffer) csq).getChars(start, end, termBuffer, termLength);
              } else {
                while (start < end)
                  termBuffer[termLength++] = csq.charAt(start++);
                // no fall-through here, as termLength is updated!
                return this;
              }
        

        I would actually leave it for Uwe to modify the api checker rules and then hand-pick offenders. Your contribution won't be lost, Joshua, it'll just go in via another route (not a direct patch, rather a good suggestion .

        Show
        Dawid Weiss added a comment - I wanted to apply these patches but then looked deeper and it seems we can't just apply them without some consideration. For example, while reviewing, I noticed things such as this one: - * All use of StringBuffers has been refactored to StringBuilder for speed. + * All use of StringBuilders has been refactored to StringBuilder for speed. which seem to be an auto-replacement artifact. While this is a no-problem, this may be: +++ b/lucene/core/src/java/org/apache/lucene/analysis/tokenattributes/CharTermAttributeImpl.java @@ -144,8 +144,8 @@ public class CharTermAttributeImpl extends AttributeImpl implements CharTermAttr } else if (csq instanceof CharBuffer && ((CharBuffer) csq).hasArray()) { final CharBuffer cb = (CharBuffer) csq; System .arraycopy(cb.array(), cb.arrayOffset() + cb.position() + start, termBuffer, termLength, len); - } else if (csq instanceof StringBuffer ) { - (( StringBuffer ) csq).getChars(start, end, termBuffer, termLength); + } else if (csq instanceof StringBuilder) { + ((StringBuilder) csq).getChars(start, end, termBuffer, termLength); but CharTermAttributeImpl already has an if clause for StringBuilder, the full code is: if (csq instanceof String ) { (( String ) csq).getChars(start, end, termBuffer, termLength); } else if (csq instanceof StringBuilder) { ((StringBuilder) csq).getChars(start, end, termBuffer, termLength); } else if (csq instanceof CharTermAttribute) { System .arraycopy(((CharTermAttribute) csq).buffer(), start, termBuffer, termLength, len); } else if (csq instanceof CharBuffer && ((CharBuffer) csq).hasArray()) { final CharBuffer cb = (CharBuffer) csq; System .arraycopy(cb.array(), cb.arrayOffset() + cb.position() + start, termBuffer, termLength, len); } else if (csq instanceof StringBuffer ) { (( StringBuffer ) csq).getChars(start, end, termBuffer, termLength); } else { while (start < end) termBuffer[termLength++] = csq.charAt(start++); // no fall-through here, as termLength is updated! return this ; } I would actually leave it for Uwe to modify the api checker rules and then hand-pick offenders. Your contribution won't be lost, Joshua, it'll just go in via another route (not a direct patch, rather a good suggestion .
        Hide
        Uwe Schindler added a comment - - edited

        Here is a new patch that was created from scratch by using Eclipse search for java.lang.StringBuffer:

        • The queryparser problems are autogen'ed code. I modified the build.xml to do a replacement in ParseException.java and TokenMgrError.java (queryparser module and solr-core)
        • Some tests called StringWriter.getBuffer().toString() - this was only detected by forbidden-apis
        • Some code parts using regex.Matcher have to use StringBuilder, because Matcher internally also uses this, so appendReplacement() and appendTail() have to stay
        • I removed some useless methods in Solr's commons-csv fork. I hope we can remove that soon!
        • In some tests, Solr used StringBuffer, but synchronization seems to be needed, I left untouched.

        Because we have to use StringBuffer quite often, I did not add it to forbidden apis, because the exclusions would prevent other forbidden stuff in the excluded files to be detected.

        I will commit this later and close this issue. Since the facet APIs were cleaned up, the original issue in IntArray disappeared, right? (the class is no longer there).

        Show
        Uwe Schindler added a comment - - edited Here is a new patch that was created from scratch by using Eclipse search for java.lang.StringBuffer: The queryparser problems are autogen'ed code. I modified the build.xml to do a replacement in ParseException.java and TokenMgrError.java (queryparser module and solr-core) Some tests called StringWriter.getBuffer().toString() - this was only detected by forbidden-apis Some code parts using regex.Matcher have to use StringBuilder, because Matcher internally also uses this, so appendReplacement() and appendTail() have to stay I removed some useless methods in Solr's commons-csv fork. I hope we can remove that soon! In some tests, Solr used StringBuffer, but synchronization seems to be needed, I left untouched. Because we have to use StringBuffer quite often, I did not add it to forbidden apis, because the exclusions would prevent other forbidden stuff in the excluded files to be detected. I will commit this later and close this issue. Since the facet APIs were cleaned up, the original issue in IntArray disappeared, right? (the class is no longer there).
        Hide
        ASF subversion and git services added a comment -

        Commit 1555645 from Uwe Schindler in branch 'dev/trunk'
        [ https://svn.apache.org/r1555645 ]

        LUCENE-5372: Replace StringBuffer by StringBuilder, where possible

        Show
        ASF subversion and git services added a comment - Commit 1555645 from Uwe Schindler in branch 'dev/trunk' [ https://svn.apache.org/r1555645 ] LUCENE-5372 : Replace StringBuffer by StringBuilder, where possible
        Hide
        ASF subversion and git services added a comment -

        Commit 1555646 from Uwe Schindler in branch 'dev/branches/branch_4x'
        [ https://svn.apache.org/r1555646 ]

        Merged revision(s) 1555645 from lucene/dev/trunk:
        LUCENE-5372: Replace StringBuffer by StringBuilder, where possible

        Show
        ASF subversion and git services added a comment - Commit 1555646 from Uwe Schindler in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1555646 ] Merged revision(s) 1555645 from lucene/dev/trunk: LUCENE-5372 : Replace StringBuffer by StringBuilder, where possible
        Hide
        Uwe Schindler added a comment -

        Thanks Joshua!

        If somebody has an idea how to add StringBuffer to forbidden without excluding tons of files, reopen.

        One idea for forbidden-apis would be: Add some new "@IgnoreForbidden" annotation (class file, not runtime visible), which makes the forbidden-api checker ignore methods or classes with that ann. This would allow to ignore only very specific code parts! Backside: You need a JAR file while compiling with the annotation...

        Show
        Uwe Schindler added a comment - Thanks Joshua! If somebody has an idea how to add StringBuffer to forbidden without excluding tons of files, reopen. One idea for forbidden-apis would be: Add some new "@IgnoreForbidden" annotation (class file, not runtime visible), which makes the forbidden-api checker ignore methods or classes with that ann. This would allow to ignore only very specific code parts! Backside: You need a JAR file while compiling with the annotation...

          People

          • Assignee:
            Uwe Schindler
            Reporter:
            Joshua Hartman
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development