Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-4370

Jempbox's ResourceEvent crazily slow to initialize

    XMLWordPrintableJSON

    Details

    • Type: Task
    • Status: Open
    • Priority: Trivial
    • Resolution: Unresolved
    • Affects Version/s: 1.8.16
    • Fix Version/s: None
    • Component/s: JempBox
    • Labels:
      None

      Description

      In our new batch of regression files on Tika, one of the new PDFs caused a timeout. This is not an infinite loop, but it does take several minutes. This may not be fixable.

      Admittedly, the XMP is large, and there are quite a few events.

      This is the code that triggers the problem.

                  XMPMetadata xmp = XMPMetadata.load(is);
                  XMPSchemaMediaManagement mmSchema = xmp.getMediaManagementSchema();
                  mmSchema.getHistory();
      

      The slow part seems to be setting the attribute namespace when creating a new ResourceEvent. When I comment out the following in ResourceEvent's initializer, the processing time is quite fast (1 second).

                  parent.setAttributeNS( 
                      XMPSchema.NS_NAMESPACE, 
                      "xmlns:stEvt", 
                      NAMESPACE );
      

        Attachments

        1. slow.zip
          439 kB
          Tim Allison

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              tallison@apache.org Tim Allison
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated: