Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-5499

Performance issue since 2.0.18

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 2.0.19
    • 2.0.27, 3.0.0 PDFBox
    • PDModel
    • None

    Description

      Our PDF is parsed in less than 200ms in 2.0.18 and more then 8 seconds in 2.0.19. The same issue is still there in 2.0.26.

       

      In version 2.0.19, SmallMap has been introduced. We're facing a performance issue since this modification.

      We patch our code to just replace the SmallMap implementation like this:

      package org.apache.pdfbox.util;
      
      import java.util.LinkedHashMap;
      
      public class SmallMap<K, V> extends LinkedHashMap<K, V> {
          // nothing : use the standard LinkedHashMap
      }

      And the performance issue disappear. 

      Our test is really simple:

          long start = System.currentTimeMillis();
          try (PDDocument document = PDDocument.load(new File(inFile))) {
            // nothing : only parsing is evaluated
          }
          long duration = System.currentTimeMillis() -start;
      
          assertTrue(duration < 500);

       

      I can understand that the SmallMap can solve issues in some cases, but it is possible to implement a factory to create this map and then allow to setup which Map implementation we want to use?

      Attachments

        1. image-2022-09-05-19-55-40-753.png
          16 kB
          Thomas Debray Luyat
        2. image-2022-09-05-17-40-22-416.png
          26 kB
          Thomas Debray Luyat
        3. image-2022-09-05-17-37-55-155.png
          27 kB
          Thomas Debray Luyat
        4. image-2022-09-05-12-48-04-608.png
          103 kB
          Thomas Debray Luyat

        Issue Links

          Activity

            People

              tilman Tilman Hausherr
              tdebray Thomas Debray Luyat
              Votes:
              1 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: