Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-5575

optimize LZWFilter

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • 2.0.27, 3.0.0 PDFBox
    • 2.0.28, 3.0.0 PDFBox
    • None
    • None
    • Patch

    Description

      I ran the PDFBox tests with a profiler and saw that LZWFilter used quite a bunch of time, so I thought I might look at the code. I just looked at it totally out of context and tried to understand what is done there and what could be changed without altering results.

      • made the private mehtods static
      • changed the variable/method parameter 'earlyChange' from integer to boolean because I thought tha would be more readable
      • some minor tweaks
      • it looks like codeTable is initialized quite often and everytime, 256 length 1 byte arrays are created, so I pre-allocate those byte arrays so that they can be shared by all code tables. tilman I assumed the contents of the codeTable entries will not be changed, and my analysis of the code seems to prove that (also the passing unit tests). Just please have a look at this so I don't break anything.
      • it took me some time to fully understand what findPatternCode() does and why it checks the codeTable in reverse order. I more or less recreated that method from scratch and I think it should now always be faster: for patterns of length 1 no iteration is done, and for longer patterns iteration stops once the correct entry is found. As this is the most notable change, please take a closer look. Unit tests pass.

      Attachments

        1. optimize_LZWFilter.patch
          6 kB
          Axel Howind

        Activity

          People

            tilman Tilman Hausherr
            axh Axel Howind
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: