Details
-
New Feature
-
Status: Open
-
Major
-
Resolution: Unresolved
-
1.16
-
None
-
Patch
Description
Hello Apache. I, along with several of my colleagues, have been using the Batik library as a dependency-of-a-dependency in production for some time. We wanted to contribute back all the changes we made for our use.
I want to extend the gratitude of Amazon Web Services. We have found Batik very useful in AWS Elemental MediaConvert, as part of the rendering pipeline for the "style passthrough" feature documented here: https://docs.aws.amazon.com/mediaconvert/latest/ug/burn-in-output-captions.html
We use SVG as an intermediate format when rendering captions, and this SVG data then gets rasterized by Batik. When we were doing our performance testing, we found that the PNG encoding used by Batik became a significant bottleneck.
This patch series, which I authored myself on behalf of Amazon Web Services, adds some more raster formats to the Batik library and its demo rasterizer app, allowing the user to select different tradeoffs of compression ratio and encoding time.
I originally developed these patches against Batik 1.2, but I have ported them to the latest trunk (r1904320) and retested them thoroughly. I also ensured that each individual patch in the series passes unit tests.
Patches
These patches are meant to be applied in this order.
- png_patch_main.diff adds a tunable parameter for ZLib compression level to the internal PNG encoder using the "hints" mechanism, exposing the functionality to other Java software that calls into the Batik library. Previously, Batik would always use the highest compression level of 9 when encoding PNGs, but my benchmark results below show how using other values can achieve higher throughput for large bulk conversions, without too much cost in compression ratio.
- png_patch_rasterizer_app.diff exposes the PNG compression level hint through the included svgrasterizer demo app.
- tga_patch_main.diff adds an encoder for the TrueVision Targa (TGA) raster file format. This is a more old-fashioned file format that uses simple RLE compression, making it dramatically faster than even the lowest PNG compression level, at the expense of a worse compression ratio. The TGA file format could be a good choice when the highest throughput is desired, but it is limited to 8 bits per channel and my encoder doesn't implement the paletted modes. Since I cannot include binary files with a patch, when applying this patch, you must also extract tga_unit_test_files.zip into your Batik tree, for the resources needed by the unit tests I added. I tried to cover all edge cases in the RLE algorithm using crafted pixel contents in the unit test input files.
- tga_patch_rasterizer_app.diff exposes the TGA encoder through the included svgrasterizer demo app.
Testing
The original implementation against Batik 1.2 has been in production for around a year now and has proven itself to be solid. Existing unit tests are passing with this patch series, and new unit tests have been added.
To demonstrate the performance benefits, I implemented a quick-and-dirty benchmark. I wrote a script rip_svgs.sh to scrape 292 SVG files from Wikimedia Commons. I then wrote another script benchmark.sh (which should be run from the top-level trunk directory) to do bulk conversions of all SVGs in each encoding mode. The script shows how long each mode took to encode the SVGs and the total size of output files, to give an idea of the tradeoff of throughput vs compression ratio. A few of the SVGs fail to render, but these get ignored silently.
Although the Wikimedia SVGs I looked at all seemed innocuous enough, I did not personally verify that every single one is safe for work, which is why I'm distributing the script instead of the SVGs. The benchmark also runs the rasterizer with -scriptSecurityOff, so there's that too. Not liable for damages from running this benchmark etc etc etc.
Here are my own results, collected on a Core i7-5930K:
TGA real 0m27.714s user 1m13.000s sys 0m2.529s 79M /tmp/svgbenches/tga PNG compression level 1 real 0m35.124s user 1m17.808s sys 0m2.538s 40M /tmp/svgbenches/png1 PNG compression level 2 real 0m35.884s user 1m21.007s sys 0m2.443s 39M /tmp/svgbenches/png2 PNG compression level 3 real 0m36.440s user 1m22.461s sys 0m2.645s 39M /tmp/svgbenches/png3 PNG compression level 4 real 0m39.022s user 1m27.379s sys 0m2.671s 33M /tmp/svgbenches/png4 PNG compression level 5 real 0m41.311s user 1m32.538s sys 0m2.600s 33M /tmp/svgbenches/png5 PNG compression level 6 real 0m41.878s user 1m27.781s sys 0m2.545s 32M /tmp/svgbenches/png6 PNG compression level 7 real 0m42.547s user 1m26.767s sys 0m2.220s 32M /tmp/svgbenches/png7 PNG compression level 8 real 1m0.883s user 1m45.745s sys 0m1.958s 32M /tmp/svgbenches/png8 PNG compression level 9 real 1m33.160s user 2m27.109s sys 0m2.784s 31M /tmp/svgbenches/png9