Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-4514

inefficient use of synchronized in PDICCBased.java

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Minor
    • Resolution: Not A Bug
    • 2.0.15
    • None
    • Rendering
    • None

    Description

      PDICCBased.java uses synchronized with static variable, e.g. synchronized (LOG) . It doesn't look to me it really needs to do it this way. This is very inefficient when multiple threads process different PDF at the same time. Change it to synchronized (this) will improve the performance.

      https://github.com/apache/pdfbox/blob/3b16f3b4f42c61dd5fe990c586f60465f83a8ef8/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/color/PDICCBased.java#L191

      Sample code simulates multiple threads process different PDF at the same time:

       

      public static void main(String[] args) throws IOException {
        for (int i = 0; i < 10; i++) { // just run multiple time
          doWork();
        }
      }
      
      private static void doWork() throws IOException {
        long startTime = System.currentTimeMillis();
        String pdfFilename = "<absolute path to your pdf file>"; // replace this with your test file
        System.setProperty("sun.java2d.cmm", "sun.java2d.cmm.kcms.KcmsServiceProvider");
      
        PDDocument document = PDDocument.load(new File(pdfFilename));
        List<PDDocument> pdfPages = new Splitter().split(document);
      
        Map<Integer, PDDocument> pdfPagesWithIndex = new HashMap<>();
      
        for (int i = 0; i < pdfPages.size(); i++) {
          pdfPagesWithIndex.put(i, pdfPages.get(i));
        }
      
        // multiple threads running in parallel
        pdfPagesWithIndex.entrySet().parallelStream().forEach(entry -> {
          try {
            processPDF(entry.getKey(), entry.getValue());
          } catch (Exception e) {
            System.out.println(e);
          }
        });
      
        System.out.println("Convertion time: " + (System.currentTimeMillis() - startTime));
      
        try {
          document.close();
        } catch (IOException ignored) {
        }
      }
      
      private static void processPDF(int index, PDDocument pdfPage) throws IOException {
        PDFRenderer renderer = new PDFRenderer(pdfPage);
        try {
          renderer.renderImageWithDPI(0, 180, ImageType.RGB);
        } catch (IOException e) {
          System.out.println(e);
        }
      
        try {
          pdfPage.close();
        } catch (IOException ignored) {
        }
      }
      

      I observed by changing synchronized (LOG) to synchronized (this), the above code can have maybe 20-30% reduction in latency. If I do a thread dump, I can see many threads are blocked on synchronized (LOG).

       

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              doctortomapat Jason
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: