We are using the <sql> task to read in a .sql file to populate a database. One of the INSERT statements that is used to populate a variable unlimited length column (declared as "TEXT" for Postgresql) is 1.5 million characters long. Using Ant 1.6.5 with JDK 1.4, the task completes in a few seconds. However, using the same version of Ant with JDK 1.5.0 (build 1.5.0_05-b05), it takes 36 minutes. After debugging the SQLExec class, it appears that the runStatements method contains an inefficient line of code. Line 483 contains this code: if ((delimiterType.equals(DelimiterType.NORMAL) && sql.toString().endsWith(delimiter)) By using the buffer's toString(), the entire buffer is getting searched (in addition to unneccessarily creating a new string) instead of using the last line read to locate the delimiter. If you change it to: if ((delimiterType.equals(DelimiterType.NORMAL) && line.endsWith(delimiter)) the problem is fixed.
If this statement makes that big a difference, I'd say something is wrong with JDK 1.5.0. If a method executes in a few seconds on jdk 1.4 it should not take 36 minutes on jdk 1.5. Even if the later platform version offers a more efficient method. Maybe the 36 minutes is mainly caused by swapping on the local computer, because either the 1.5 vm was loaded adittionally, or just because the 1.5 vm is slightly larger? sql is a StringBuffer(), and afaik not all java versions supported by ant have the endsWith method on the StringBuffer, therefore we cannot improve the efficiency in the propesed way.
This is actually because the implementation of String/StringBuffer is very different between jdk 1.4 and jdk 1.5. With jdk 1.4, a StringBuffer.toString() does create a String object but the content (char array) is just shared (as the string is immutable this a good optimization and was introduced in jdk 1.3 if I remember well) With jdk 1.5, the content is not shared anymore but copied directly. As you have a huge content, it creates this major bottleneck as memory is copied over and over and over.
Dear Mr Kruithof, I would be hesitant to say something is wrong with JDK 1.5. After all, the statement does in the case of a SQL that spans HUGE number of lines to continuously create large Strings. There is nothing wrong in the JDK 1.5 (execution wise), it is doing what it is being asked to do. Creating a big number of large strings will take a lot of time. In JDK 1.4, the hotspot compiler is probably doing some short circuiting to enhance the performance. The hotspot equivalent in JDK 1.5 is still immature in comparison. As per your second comment, I would advise that you re-read that gentlemen's statement, and retract yours. He suggested that the "endsWith()" method should be performed on "line" and not "sql". Your statement regarding StringBuffer and Java just does not make sense. Reading is fundamental. Thanks, Boon (In reply to comment #1) > If this statement makes that big a difference, I'd say something is wrong with > JDK 1.5.0. If a method executes in a few seconds on jdk 1.4 it should not take > 36 minutes on jdk 1.5. Even if the later platform version offers a more > efficient method. > > Maybe the 36 minutes is mainly caused by swapping on the local computer, because > either the 1.5 vm was loaded adittionally, or just because the 1.5 vm is > slightly larger? > > sql is a StringBuffer(), and afaik not all java versions supported by ant have > the endsWith method on the StringBuffer, therefore we cannot improve the > efficiency in the propesed way.
Actually to be more accurate the 'good optimization' is actually a trade-off as this can lead to major memory usage if you are reusing a StringBuffer and reusing it via setLength(0). The reason is that when the char array is not shared anymore and only belongs to the String object..u could actually be sharing an allocated 1000000 char array when the string is actually made of only 1 char in this char array. Do that over a loop and you could very easily get the gc crawling and end up with in an OOME
Stephane, Thanks for your enlightening post. I guess I should have read the source for the JDK. Like I said, reading is fundamental. Boon
Sorry, I do hesitate to call somthing a bug, but changing a method to have a new execution speed orders of magnitude slower as before is generally considered a bug, even if the functional behaviour remains the same. In case this delay is mainly caused by swapping (because the change in the java environment memory consumption is slightly up) it need not be a bug, but may be just bad luck, an option I left explicitly open. Sun confirmed this bug in JDK 1.5 (see http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6219959) I indeed misread the sql to line change. StringBuffer sql may be appended to though, even after String line is added, so the proposed change is not equivalent to the original code. We get quite some proposed fixes/enhancements that use newer java features as available in jdk 1.2 Does changing the code to if ((delimiterType.equals(DelimiterType.NORMAL) && stringBufferEndsWith(sql,delimiter)) and introducing the method static boolean stringBufferEndsWith(StringBuffer b, String end) { if (b.length() < end.length()) { return false; } return (end.equals(b.substring(b.length()-end.length(),b.length()))); } fix the speed problem on jdk 1.5, yet keep the same behaviour? (in case it does I'll introduce this change)
Fixed in HEAD. I introduced a method StringUtils.endsWith(StringBuffer, String) that avoid using any JDK 1.4+ API and workaround any performance issue. Please test it confirm that the issue is fixed.
(In reply to comment #7) > Please test it confirm that the issue is fixed. for what it's worth, this bug has been completely resolved for us in Ant 1.7.0 (I work for the same company as the original bug submitter). Thank you so much for fixing this!!