Uploaded image for project: 'Batik'
  1. Batik
  2. BATIK-1343

[PATCH] PNG compression level hint and TGA encoder for higher throughput bulk rasterization

Attach filesAttach ScreenshotAdd voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Patch

    Description

      Hello Apache. I, along with several of my colleagues, have been using the Batik library as a dependency-of-a-dependency in production for some time. We wanted to contribute back all the changes we made for our use.

      I want to extend the gratitude of Amazon Web Services. We have found Batik very useful in AWS Elemental MediaConvert, as part of the rendering pipeline for the "style passthrough" feature documented here: https://docs.aws.amazon.com/mediaconvert/latest/ug/burn-in-output-captions.html

      We use SVG as an intermediate format when rendering captions, and this SVG data then gets rasterized by Batik. When we were doing our performance testing, we found that the PNG encoding used by Batik became a significant bottleneck.

      This patch series, which I authored myself on behalf of Amazon Web Services, adds some more raster formats to the Batik library and its demo rasterizer app, allowing the user to select different tradeoffs of compression ratio and encoding time.

      I originally developed these patches against Batik 1.2, but I have ported them to the latest trunk (r1904320) and retested them thoroughly. I also ensured that each individual patch in the series passes unit tests.

      Patches

      These patches are meant to be applied in this order.

      1. png_patch_main.diff adds a tunable parameter for ZLib compression level to the internal PNG encoder using the "hints" mechanism, exposing the functionality to other Java software that calls into the Batik library. Previously, Batik would always use the highest compression level of 9 when encoding PNGs, but my benchmark results below show how using other values can achieve higher throughput for large bulk conversions, without too much cost in compression ratio.
      2. png_patch_rasterizer_app.diff exposes the PNG compression level hint through the included svgrasterizer demo app.
      3. tga_patch_main.diff adds an encoder for the TrueVision Targa (TGA) raster file format. This is a more old-fashioned file format that uses simple RLE compression, making it dramatically faster than even the lowest PNG compression level, at the expense of a worse compression ratio. The TGA file format could be a good choice when the highest throughput is desired, but it is limited to 8 bits per channel and my encoder doesn't implement the paletted modes. Since I cannot include binary files with a patch, when applying this patch, you must also extract tga_unit_test_files.zip into your Batik tree, for the resources needed by the unit tests I added. I tried to cover all edge cases in the RLE algorithm using crafted pixel contents in the unit test input files.
      4. tga_patch_rasterizer_app.diff exposes the TGA encoder through the included svgrasterizer demo app.

      Testing

      The original implementation against Batik 1.2 has been in production for around a year now and has proven itself to be solid. Existing unit tests are passing with this patch series, and new unit tests have been added.

      To demonstrate the performance benefits, I implemented a quick-and-dirty benchmark. I wrote a script rip_svgs.sh to scrape 292 SVG files from Wikimedia Commons. I then wrote another script benchmark.sh (which should be run from the top-level trunk directory) to do bulk conversions of all SVGs in each encoding mode. The script shows how long each mode took to encode the SVGs and the total size of output files, to give an idea of the tradeoff of throughput vs compression ratio. A few of the SVGs fail to render, but these get ignored silently.

      Although the Wikimedia SVGs I looked at all seemed innocuous enough, I did not personally verify that every single one is safe for work, which is why I'm distributing the script instead of the SVGs. The benchmark also runs the rasterizer with -scriptSecurityOff, so there's that too. Not liable for damages from running this benchmark etc etc etc.

      Here are my own results, collected on a Core i7-5930K:

      TGA
      
      real    0m27.714s
      user    1m13.000s
      sys    0m2.529s
      79M    /tmp/svgbenches/tga
      
      
      PNG compression level 1
      
      real    0m35.124s
      user    1m17.808s
      sys    0m2.538s
      40M    /tmp/svgbenches/png1
      
      
      PNG compression level 2
      
      real    0m35.884s
      user    1m21.007s
      sys    0m2.443s
      39M    /tmp/svgbenches/png2
      
      
      PNG compression level 3
      
      real    0m36.440s
      user    1m22.461s
      sys    0m2.645s
      39M    /tmp/svgbenches/png3
      
      
      PNG compression level 4
      
      real    0m39.022s
      user    1m27.379s
      sys    0m2.671s
      33M    /tmp/svgbenches/png4
      
      
      PNG compression level 5
      
      real    0m41.311s
      user    1m32.538s
      sys    0m2.600s
      33M    /tmp/svgbenches/png5
      
      
      PNG compression level 6
      
      real    0m41.878s
      user    1m27.781s
      sys    0m2.545s
      32M    /tmp/svgbenches/png6
      
      
      PNG compression level 7
      
      real    0m42.547s
      user    1m26.767s
      sys    0m2.220s
      32M    /tmp/svgbenches/png7
      
      
      PNG compression level 8
      
      real    1m0.883s
      user    1m45.745s
      sys    0m1.958s
      32M    /tmp/svgbenches/png8
      
      
      PNG compression level 9
      
      real    1m33.160s
      user    2m27.109s
      sys    0m2.784s
      31M    /tmp/svgbenches/png9
      
      
      

      Attachments

        1. tga_unit_test_files.zip
          87 kB
          Max Eliaser
        2. benchmark.sh
          1.0 kB
          Max Eliaser
        3. png_patch_main.diff
          7 kB
          Max Eliaser
        4. png_patch_rasterizer_app.diff
          7 kB
          Max Eliaser
        5. tga_patch_rasterizer_app.diff
          5 kB
          Max Eliaser
        6. rip_svgs.sh
          1 kB
          Max Eliaser
        7. tga_patch_main.diff
          28 kB
          Max Eliaser

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            MaxEliaserAWS Max Eliaser

            Dates

              Created:
              Updated:

              Slack

                Issue deployment