Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-4514

inefficient use of synchronized in PDICCBased.java

VotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Minor
    • Resolution: Not A Bug
    • Affects Version/s: 2.0.15
    • Fix Version/s: None
    • Component/s: Rendering
    • Labels:
      None

      Description

      PDICCBased.java uses synchronized with static variable, e.g. synchronized (LOG) . It doesn't look to me it really needs to do it this way. This is very inefficient when multiple threads process different PDF at the same time. Change it to synchronized (this) will improve the performance.

      https://github.com/apache/pdfbox/blob/3b16f3b4f42c61dd5fe990c586f60465f83a8ef8/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/color/PDICCBased.java#L191

      Sample code simulates multiple threads process different PDF at the same time:

       

      public static void main(String[] args) throws IOException {
        for (int i = 0; i < 10; i++) { // just run multiple time
          doWork();
        }
      }
      
      private static void doWork() throws IOException {
        long startTime = System.currentTimeMillis();
        String pdfFilename = "<absolute path to your pdf file>"; // replace this with your test file
        System.setProperty("sun.java2d.cmm", "sun.java2d.cmm.kcms.KcmsServiceProvider");
      
        PDDocument document = PDDocument.load(new File(pdfFilename));
        List<PDDocument> pdfPages = new Splitter().split(document);
      
        Map<Integer, PDDocument> pdfPagesWithIndex = new HashMap<>();
      
        for (int i = 0; i < pdfPages.size(); i++) {
          pdfPagesWithIndex.put(i, pdfPages.get(i));
        }
      
        // multiple threads running in parallel
        pdfPagesWithIndex.entrySet().parallelStream().forEach(entry -> {
          try {
            processPDF(entry.getKey(), entry.getValue());
          } catch (Exception e) {
            System.out.println(e);
          }
        });
      
        System.out.println("Convertion time: " + (System.currentTimeMillis() - startTime));
      
        try {
          document.close();
        } catch (IOException ignored) {
        }
      }
      
      private static void processPDF(int index, PDDocument pdfPage) throws IOException {
        PDFRenderer renderer = new PDFRenderer(pdfPage);
        try {
          renderer.renderImageWithDPI(0, 180, ImageType.RGB);
        } catch (IOException e) {
          System.out.println(e);
        }
      
        try {
          pdfPage.close();
        } catch (IOException ignored) {
        }
      }
      

      I observed by changing synchronized (LOG) to synchronized (this), the above code can have maybe 20-30% reduction in latency. If I do a thread dump, I can see many threads are blocked on synchronized (LOG).

       

        Attachments

        Issue Links

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              doctortomapat Jason

              Dates

              • Created:
                Updated:
                Resolved:

                Issue deployment