We format a lot of double values to strings for PDF and PostScript output. We currently use DecimalFormat in a problematic way (DecimalFormat is not thread-safe). It also doesn't quite cover the requirements I have for the method in that particular context. Furthermore, this has shown as a minor hotspot in profiling sessions. The task is to write an implementation of the following method: static void formatDouble(double value, int decimals, int precision, StringBuffer target) value: the double value decimals: the number of decimal places behind the decimal point precision: the maximum precision for small values. If the value is between -1 and +1, the precision needs to be increased up to the number of decimal places indicated by this parameter. This is used in contexts where transformation matrices scale the content and a higher precision might be required. target: the formatted characters are appended to the given StringBuffer. A StringBuffer is chosen for efficiency so the number of String concatenations can be reduced if possible. A wrapper method returning a String is easily written around this method. The goal of this method is to have a compact representation of a double value while adressing accuracy requirements in the usage context. the decimal point is . (".", PERIOD) (this is not dependent on the user's locale) A few examples (consider decimals being 4 and precision being 8): 0.0 should be rendered as "0" 0.1 should be rendered as "0.1" 1234.1 should be rendered as "1234.1" 1234.1234567 should be rendered as "1234.1235" (note the trailing 5! Rounding!) 1234.00001 should be rendered as "1234" 0.00001 should be rendered as "0.00001" (here you see the effect of the "precision" parameter. 0.00000001 should be rendered as "0.00000001". 0.000000001 should be rendered as "0". Acceptance criteria: - The method has to prove to be faster than new DecimalFormat("0.######", new DecimalFormatSymbols(Locale.US)).format(value) on Sun Java 1.4, 1.5 an 6.0. - The method must be thread-safe. - The method must have a javadoc comment. - It must be accompanied by a JUnit TestCase testing all aspects of the method. - It should not depend on any additional library not already in the dependency list of Apache XML Graphics Commons (its destination in the end). Any volunteers? The prize: eternal glory and a big "thank you" from me! ;-) Otherwise, this is just a reminder for myself.
Performed no timings or comparisons yet, but the result is OK for the mentioned cases. How about: public static void formatDouble( double source, int decimals, int precision, StringBuffer target) { boolean isPositive = (source >= 0); long intPart = (long) (isPositive ? Math.floor(source) : Math.ceil(source)); int scale = (intPart >= 1) ? decimals : precision; long fracPart = (long) Math.abs(Math.round((source - intPart) * Math.pow(10, scale))); if (intPart != 0 || fracPart != 0) { /* non-zero value */ if (!isPositive) { /* negative value, insert sign */ target.append('-'); } /* append integer part*/ target.append(intPart); if (fracPart != 0) { /* append fractional part */ target.append('.'); /* insert leading zeroes */ while (fracPart < Math.pow(10, --scale)) { target.append('0'); } target.append(fracPart); /* remove trailing zeroes */ for (int i = target.length(); --i >= 0 && target.charAt(i) == '0'; ) { target.deleteCharAt(i); } } } else { target.append('0'); } }
Further info: formatting 50000 random doubles formatDouble(source, 4, 8, target) [216ms] seems to be slightly faster than target.append(Double.toString(source)) [236ms] while having the benefit of the decimals and precision arguments. It is also almost twice as fast as DecimalFormat df = new DecimalFormat("0.########", new DecimalFormatSymbols(Locale.US)); for (int i = 50000; --i >= 0; ) { ... target.append(df.format(source)); } which takes 421ms on my end. Tests run on Apple JVM 1.4.2
(In reply to comment #1) Slight correction > /* append integer part*/ > target.append(intPart); Should be: target.append(Math.abs(intPart));
... and one more correction ;-) The scale should also depend on whether Math.abs(intPart) > 1. Increasing the number of iterations by a factor of 10, the difference with Double.toString() remains the same. Double.toString() is sometimes faster, probably as a result of the values generated by Math.random(). The difference with DecimalFormat.format(), on the other hand increases. 1650-1700ms vs. almost 4000ms for the latter. I suspect this difference to be caused by the fact that DecimalFormat.format() is an instance method, where the other two are statics.
Created attachment 25508 [details] The proposed patch Add an implementation of the format method as described. The main format method is split between formatFast and formatPrecise, using one or the other depending on the source and precision input. I will attach a test case shortly after.
Created attachment 25509 [details] The test case The test case contains a simple test (with fixed values), a random test, and a performance comparison (which is disabled).
Comment on attachment 25508 [details] The proposed patch Changed MIME Type to text/plain
Maybe this bug should be put in XML Graphics Commons instead of FOP, so that all the XML Graphics projects would benefit from it. What do you think?
Created attachment 25585 [details] Patch for PSGenerator using the new class
Created attachment 25586 [details] Patch for PDFNumber using the new class
We could also use the new fast class in PCLGenerator and in IFUtil. (I will provide the patchs later if this is required).
Nice job, Julien. I reviewed DoubleFormatUtil and DoubleFormatUtilTest. DoubleFormatUtil contains requested static method formatDouble, and few other utility methods that may be used for double maniupation. As test case shows, all requested (and some additional) examples are formatted right: + 0.0 should be rendered as "0" (true) + 0.1 should be rendered as "0.1" (true) + 1234.1 should be rendered as "1234.1" (true) + 1234.1234567 should be rendered as "1234.1235" (note the trailing 5! Rounding!) + 1234.00001 should be rendered as "1234" (true) + 0.00001 should be rendered as "0.00001" (here you see the effect of the + "precision" parameter. (true, need to add test case) + 0.00000001 should be rendered as "0.00000001". (true) + 0.000000001 should be rendered as "0". (true) Also, acceptance criteria are fulfilled: + The method has to prove to be faster than new DecimalFormat("0.######", new DecimalFormatSymbols(Locale.US)).format(value) on Sun Java 1.4, 1.5 an 6.0. (true) 1.4.2_05 - 672 ms for formatDecimal compared to 1782 ms ms for DecimalFormat.format 1.5.0_11 - 562 ms for formatDecimal compared to 1625 ms for DecimalFormat.format 1.6.0_24 - 485 ms for formatDecimal compared to 1188 ms for DecimalFormat.format Tested on Windows XP SP3, Intel Core 2 Duo, E6550, 2.33 GHz. + The method must be thread-safe. (true) Static method, using only local variables. + The method must have a javadoc comment. (true) + It must be accompanied by a JUnit TestCase testing all aspects of the method. (true) + It should not depend on any additional library not already in the dependency (true) list of Apache XML Graphics Commons (its destination in the end). Commiters, could you please add attachments 1 and 2 to the code base. Note that attachments 1 and 2 are for packages org.apache.xmlgraphics.util. If you prefer putting them into package org.apache.fop.util, please change the package declaration. Attachments 3 and 4 are not tested, I propose that those attachments be part of solution for bug #51150.
While looking at the patch something struck me as odd in the requirements. Taking the same example of decimals being 4 and precision 8, 1.00001 will be rounded to 1, while 0.99999999 will be kept as is. Yet the latter is much closer to 1 than the former (1000 times, to be precise). So if rounding had to be done, that would rather have to be the other way round, that is: keep 1.00001 as is and round 0.99999999 up to 1. It seems to me that if drawing commands are used with numbers around 1, then the result is likely to look sub-optimal. Especially since transformation matrices can potentially scale up rounding errors quite a bit. At the other end of the spectrum, 12345678901.2345 will be output as is (15 significant digits), while 1.23456789012345 will be rounded to 1.2346 (5 significant digits). What justifies to have 15 significant digits on one hand, and only 5 on the other hand? I understand that this exercise has to do with outputting real numbers to PDF or PostScript (or other formats). According to the PDF Reference, Acrobat uses single-precision floating-point numbers to represent reals (the 'float' type in Java). Given that a float has approximately 8 significant digits —which is actually what the PostScript Reference puts in its "Implementation Limits" section— I suggest that we use this as the sole parameter. Now, would we really want this parameter to be configurable? Using a smaller number of significant digits probably makes no sense; using a higher number may allow to achieve higher precision with those applications (viewers or printers) that go beyond Acrobat and use doubles to represent real numbers. Such applications may actually become more and more common with the rise of 64-bit processors. Then again, will that increased precision make a significant difference in the context of drawing graphics? Any thoughts on this are welcome. As a side note, if we choose to fix the precision to 8 then the implementation becomes almost trivial: just convert the double into a float and 'de-scientificize' the output of toString. Thanks, Vincent
resetting P2 open bugs to P3 pending further review
(In reply to comment #5) > Created attachment 25508 [details] > The proposed patch > > Add an implementation of the format method as described. > The main format method is split between formatFast and formatPrecise, > using one or the other depending on the source and precision input. > > I will attach a test case shortly after. hi julien, in order to commit your non-trivial patch, it is necessary to have you submit an individual contributor license agreement, about which see http://www.apache.org/licenses/icla.txt; once you submit and you have been listed http://people.apache.org/committer-index.html#unlistedclas, then i can proceed to commit your patch to xmlgraphics-commons; thanks, glenn
increase priority for bugs with a patch
Other than o.a.f.pdf.PDFNumber, I found two more occurrences of DecimalFormatCache: 1. o.a.f.pdf.PDFColorHandler 2. o.a.f.render.intermediate.IFUtil Those need to have DecimalFormatCache appropriately replaced with DecimalFormatUtil, before we can safely remove DecimalFormatCache. I am working on the patches for those two classes.
(In reply to comment #17) > Other than o.a.f.pdf.PDFNumber, I found two more occurrences of > DecimalFormatCache: > > 1. o.a.f.pdf.PDFColorHandler > 2. o.a.f.render.intermediate.IFUtil > > Those need to have DecimalFormatCache appropriately replaced with > DecimalFormatUtil, before we can safely remove DecimalFormatCache. > > I am working on the patches for those two classes. just a reminder, but i will need a ICLA filed by Julien in order to apply his patch; see comment 15
(In reply to comment #18) > (In reply to comment #17) > > Other than o.a.f.pdf.PDFNumber, I found two more occurrences of > > DecimalFormatCache: > > > > 1. o.a.f.pdf.PDFColorHandler > > 2. o.a.f.render.intermediate.IFUtil > > > > Those need to have DecimalFormatCache appropriately replaced with > > DecimalFormatUtil, before we can safely remove DecimalFormatCache. > > > > I am working on the patches for those two classes. > > just a reminder, but i will need a ICLA filed by Julien in order to apply his > patch; see comment 15 Please also see comment 13. I think some modifications to the patch are necessary. Vincent
Vincent, (In reply to comment #19) > Please also see comment 13. I think some modifications to the patch are > necessary. I don't know why Jeremias asked for two parameters for number of decimal places (one general, and one for [-1,+1] range). Maybe he planned to use this patch also for "transformation matrices" with values in range [-1,+1], but that's just a guess. Anyway, I believe that we could use this patch without problems since it really does not change current behavior of the trunk. The trunk will always call format method with those two parameters being equal: (3,3), (4,4), (8,8), etc. If you think that there should be change in the way FOP handles doubles, in terms of always using 8 decimal places, is it possible to open separate issue? I would really like to close this one, and potentially solve memory leak problems introduced by ThreadLocal. -Ognjen
Glenn, (In reply to comment #18) > just a reminder, but i will need a ICLA filed by Julien in order to apply his > patch; see comment 15 I understand. Let me test all Julien's patches first, to see if there are still usable. Then we could think about ICLA. -Ognjen
(In reply to comment #20) > Vincent, > > (In reply to comment #19) > > Please also see comment 13. I think some modifications to the patch are > > necessary. > > I don't know why Jeremias asked for two parameters for number of decimal places > (one general, and one for [-1,+1] range). Maybe he planned to use this patch > also for "transformation matrices" with values in range [-1,+1], but that's > just a guess. > > Anyway, I believe that we could use this patch without problems since it really > does not change current behavior of the trunk. The trunk will always call > format method with those two parameters being equal: (3,3), (4,4), (8,8), etc. > > If you think that there should be change in the way FOP handles doubles, in > terms of always using 8 decimal places, is it possible to open separate issue? > I would really like to close this one, and potentially solve memory leak > problems introduced by ThreadLocal. i'm more interested in solving the immediate problem than changing the way FOP handles doubles; i agree it could be a separate issue (if there is a desire to address it... but i don't know there is a need)
Created attachment 28686 [details] Unified patch for PDFColorHandler and IFUtil
(In reply to comment #17) > Other than o.a.f.pdf.PDFNumber, I found two more occurrences of > DecimalFormatCache: > > 1. o.a.f.pdf.PDFColorHandler > 2. o.a.f.render.intermediate.IFUtil > > Those need to have DecimalFormatCache appropriately replaced with > DecimalFormatUtil, before we can safely remove DecimalFormatCache. > > I am working on the patches for those two classes. Unified patch is attached. All the patches attached to this issue are functional. After applying all patches to xmlgraphics-commons and fop, one may delete DecimalFormatCache class safely... and solve memory leaks. -Ognjen
Hi everyone, sorry for the late response, I was not aware that a ICLA was required for a patch. I will gladly sign it, so how should I do it ? A scan of the signed ICLA attached to this bug will do? Regards, Julien
> A scan of the signed ICLA attached to this bug will do? No, I think you should send scan to the secretary@apache.org. Plase read: http://www.apache.org/licenses/#clas -Ognjen
(In reply to comment #25) > Hi everyone, > > sorry for the late response, I was not aware that a ICLA was required for a > patch. > I will gladly sign it, so how should I do it ? > A scan of the signed ICLA attached to this bug will do? > > Regards, > Julien it depends on the size and nature of the patch; if small and trivial, then no ICLA is needed; it was my judgment that an ICLA is needed in this case due to the amount of code
Hi everyone, I've just submitted the ICLA to secretary@apache.org. Thank you all.
By the way, I don't know what is the policy about @author tags in java files here in XmlGraphics/FOP, but feel free to remove them if the policy is "no tags" (I think there is such a policy in Apache Commons projects)
Hi, my ICLA has been received and filed. So the patch is good to go :-)
Patches applied at: http://svn.apache.org/viewvc?rev=1331933&view=rev (XMLGraphicsCommons) http://svn.apache.org/viewvc?rev=1331939&view=rev (XMLGraphicsCommons) http://svn.apache.org/viewvc?rev=1331950&view=rev (Fop) Please verify and close if satisfied. Thanks Julien and Ognjen for patches, Jeremmias for the original report, and Andreas for additional analysis!
Thank you. -Ognjen
Created attachment 28846 [details] simple test that throws an exception The attached simple example throws this exception: Exception in thread "main" java.lang.NumberFormatException: For input string: "0.005859375" at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48) at java.lang.Integer.parseInt(Integer.java:458) at java.lang.Integer.parseInt(Integer.java:499) at org.apache.xmlgraphics.util.DoubleFormatUtil.formatDoublePrecise(DoubleFormatUtil.java:120) at org.apache.xmlgraphics.util.DoubleFormatUtil.formatDouble(DoubleFormatUtil.java:59) at org.apache.fop.pdf.PDFNumber.doubleOut(PDFNumber.java:82) at bugs.DoubleFormatUtilBug.main(DoubleFormatUtilBug.java:13) I am puzzled... This does not happen with FOP-1.0 so I assume this is caused by the new code associated with this ticket.
(In reply to comment #33) > Created attachment 28846 [details] > simple test that throws an exception > > The attached simple example throws this exception: > > Exception in thread "main" java.lang.NumberFormatException: For input > string: "0.005859375" > at > java.lang.NumberFormatException.forInputString(NumberFormatException.java:48) > at java.lang.Integer.parseInt(Integer.java:458) > at java.lang.Integer.parseInt(Integer.java:499) > at > org.apache.xmlgraphics.util.DoubleFormatUtil. > formatDoublePrecise(DoubleFormatUtil.java:120) > at > org.apache.xmlgraphics.util.DoubleFormatUtil.formatDouble(DoubleFormatUtil. > java:59) > at org.apache.fop.pdf.PDFNumber.doubleOut(PDFNumber.java:82) > at bugs.DoubleFormatUtilBug.main(DoubleFormatUtilBug.java:13) > > I am puzzled... This does not happen with FOP-1.0 so I assume this is caused > by the new code associated with this ticket. I can see that the code erroneously assumes (without checking) that s contains an 'E'. If it does not contain 'E' but does contain '.', then parseInt is handed a string containing a '.', which throws the NumberFormatException. } else { // Scientific representation of double: "x.xxxxxEyyy" int dot = s.indexOf('.'); int exp = s.indexOf('E'); int exposant = Integer.parseInt(s.substring(exp + 1)); Julien or Ognjen, can you please submit a patch and test for this bug?
I will attach both a test case and a patch, later this afternoon.
The bug is trivial: "if (source >= 10e-3 && source < 1e7) {" is wrong ! "if (source >= 1e-3 && source < 1e7) {" is right. My mistake when reading 10 exp -3 in Double.toString javadoc and writing 10e-3 instead of 1e-3. Patch will follow.
Created attachment 28850 [details] patch for both class and test case (DoubleFormatUtil.java and DoubleFormatUtilTest.java)
(In reply to comment #36) > The bug is trivial: > "if (source >= 10e-3 && source < 1e7) {" is wrong ! > "if (source >= 1e-3 && source < 1e7) {" is right. > > My mistake when reading 10 exp -3 in Double.toString javadoc and writing > 10e-3 instead of 1e-3. > > Patch will follow. i'm not satisfied with this patch, since it depends upon the implementation specific nature of Double.toString(); i would like to see better checking in the code I listed in comment 34 to remove assumptions about the use of scientific notation (and assuming that 'E' is present)
(In reply to comment #38) > (In reply to comment #36) > > The bug is trivial: > > "if (source >= 10e-3 && source < 1e7) {" is wrong ! > > "if (source >= 1e-3 && source < 1e7) {" is right. > > > > My mistake when reading 10 exp -3 in Double.toString javadoc and writing > > 10e-3 instead of 1e-3. > > > > Patch will follow. > > i'm not satisfied with this patch, since it depends upon the implementation > specific nature of Double.toString(); i would like to see better checking in > the code I listed in comment 34 to remove assumptions about the use of > scientific notation (and assuming that 'E' is present) for example, what happens if source is NaN? just looking at the code makes me think the same else clause cited in comment 34 will apply, which means yet again, that parseInt will be invoked on the string "NaN"
(In reply to comment #39) > (In reply to comment #38) > > (In reply to comment #36) > > > The bug is trivial: > > > "if (source >= 10e-3 && source < 1e7) {" is wrong ! > > > "if (source >= 1e-3 && source < 1e7) {" is right. > > > > > > My mistake when reading 10 exp -3 in Double.toString javadoc and writing > > > 10e-3 instead of 1e-3. > > > > > > Patch will follow. > > > > i'm not satisfied with this patch, since it depends upon the implementation > > specific nature of Double.toString(); i would like to see better checking in > > the code I listed in comment 34 to remove assumptions about the use of > > scientific notation (and assuming that 'E' is present) > > for example, what happens if source is NaN? just looking at the code makes > me think the same else clause cited in comment 34 will apply, which means > yet again, that parseInt will be invoked on the string "NaN" in this regard, please add NaN, +Infinity, and -Infinity to your test cases; also add the following test cases: * 1E-3 + Double.MIN_VALUE * 1E-3 * 1E-3 - Double.MIN_VALUE * 1E7 + Double.MIN_VALUE * 1E7 * 1E7 - Double.MIN_VALUE
My understanding of Double.toString javadoc is that the contract described in the javadoc is mandatory for every implementation of the JVM. But since I may be wrong, feel free to correct the patch. I implemented this feature using Double comparison since I felt it was faster than doing another indexOf on the String (I may be wrong again :-) ). If you feel strongly for a second check I'll do it.
(In reply to comment #41) > My understanding of Double.toString javadoc is that the contract described > in the javadoc is mandatory for every implementation of the JVM. I agree with that. Otherwise the format return by toString wouldn't be described in such details. However, NaN and Infinity do have to be checked. > But since I may be wrong, feel free to correct the patch. > I implemented this feature using Double comparison since I felt it was > faster than doing another indexOf on the String (I may be wrong again :-) ). > > If you feel strongly for a second check I'll do it. Vincent
Created attachment 28855 [details] patch for both class and test case (DoubleFormatUtil.java and DoubleFormatUtilTest.java) Added test case for limit conditions, and fixed DoubleFormatUtil accordingly. There was a bug in the private format(StringBuffer, ...) method, when checking against tenPow(scale) if scale >= 19 (since 10^19 > Long.MAX_VALUE). There was also a bug in formatDoublePrecise when trying to reconstruct the full decimal part from the scientific representation, when decLength < digits. Again, the tests at limits showed these bugs, and I fixed them.
patched at: http://svn.apache.org/viewvc?rev=1343794&view=rev http://svn.apache.org/viewvc?rev=1343800&view=rev if there is another double format bug, please file a new bug report rather than reopening a prior closed bug
I've just created a new bug concerning DoubleFormatUtil when formating double near Double.MIN_VALUE (I already attached a test case + patch): see Bug #53327