Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-2223

Upgrade xercesImpl to 2.11.0 to fix hang on issue in tika mimetype detection

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.11
    • Fix Version/s: 1.12
    • Component/s: parser
    • Labels:
      None

      Description

      Stracktrace for the hang seems to be:

      at org.apache.xerces.impl.XMLScanner.scanExternalID(Unknown Source)
      at org.apache.xerces.impl.XMLDocumentScannerImpl.scanDoctypeDecl(Unknown Source)
      at org.apache.xerces.impl.XMLDocumentScannerImpl$PrologDispatcher.dispatch(Unknown Source)
      at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
      at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
      at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
      at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
      at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
      at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
      at org.apache.xerces.jaxp.SAXParserImpl.parse(Unknown Source)
      at javax.xml.parsers.SAXParser.parse(SAXParser.java:195)
      at org.apache.tika.detect.XmlRootExtractor.extractRootElement(XmlRootExtractor.java:54)
      at org.apache.tika.detect.XmlRootExtractor.extractRootElement(XmlRootExtractor.java:41)
      at org.apache.tika.mime.MimeTypes.getMimeType(MimeTypes.java:192)
      at org.apache.tika.mime.MimeTypes.detect(MimeTypes.java:439)
      at org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:61)
      at org.apache.tika.cli.TikaCLI$10.process(TikaCLI.java:252)
      at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:417)
      at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:111)
      

        Attachments

        1. NUTCH-2223.patch
          1 kB
          Tien Nguyen Manh

          Issue Links

            Activity

              People

              • Assignee:
                markus17 Markus Jelsma
                Reporter:
                tiennm Tien Nguyen Manh
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: