Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-1337

Improve PDFOperator performance on multithreading environment

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.6.0
    • 1.7.1
    • Parsing, Utilities
    • None

    Description

      With more than 6 threads, the API PDFOperator#getOperator(String operator) is still blocked :

      Sample with 48 threads :

      pool-1-thread-46" - Thread t@72
      java.lang.Thread.State: RUNNABLE
      at org.apache.pdfbox.util.PDFOperator.getOperator(PDFOperator.java:76)
      at org.apache.pdfbox.pdfparser.PDFStreamParser.parseNextToken(PDFStreamParser.java:441)
      at org.apache.pdfbox.pdfparser.PDFStreamParser.access$000(PDFStreamParser.java:46)
      at org.apache.pdfbox.pdfparser.PDFStreamParser$1.tryNext(PDFStreamParser.java:175)
      at org.apache.pdfbox.pdfparser.PDFStreamParser$1.hasNext(PDFStreamParser.java:187)
      at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:266)

      I propose to remove the synchronization of the attribute "operators" and set up a synchronization
      on the put operation. (This optimization saves 30 percent of time)

      public class PDFOperator
      {
      [...]

      // private static Map operators = Collections.synchronizedMap( new HashMap() );
      private static Map operators = new HashMap();

      [...]

      public static PDFOperator getOperator( String operator )
      {
      PDFOperator operation = null;
      if( operator.equals( "ID" ) || operator.equals( "BI" ) )

      { //we can't cache the ID operators. operation = new PDFOperator( operator ); }

      else
      {
      operation = (PDFOperator)operators.get(operator);
      if( operation == null )
      {
      synchronized (operators) {
      operation = (PDFOperator)operators.get(operator);
      if ( operation == null )

      { operation = new PDFOperator( operator ); operators.put( operator, operation ); }

      }
      }
      }
      return operation;
      }

      [...]
      }

      Attachments

        Activity

          People

            tboehme Timo Boehme
            dmytryk Alexis
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - 1h
                1h
                Remaining:
                Remaining Estimate - 1h
                1h
                Logged:
                Time Spent - Not Specified
                Not Specified