Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-2223

Upgrade xercesImpl to 2.11.0 to fix hang on issue in tika mimetype detection

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.11
    • 1.12
    • parser
    • None

    Description

      Stracktrace for the hang seems to be:

      at org.apache.xerces.impl.XMLScanner.scanExternalID(Unknown Source)
      at org.apache.xerces.impl.XMLDocumentScannerImpl.scanDoctypeDecl(Unknown Source)
      at org.apache.xerces.impl.XMLDocumentScannerImpl$PrologDispatcher.dispatch(Unknown Source)
      at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
      at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
      at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
      at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
      at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
      at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
      at org.apache.xerces.jaxp.SAXParserImpl.parse(Unknown Source)
      at javax.xml.parsers.SAXParser.parse(SAXParser.java:195)
      at org.apache.tika.detect.XmlRootExtractor.extractRootElement(XmlRootExtractor.java:54)
      at org.apache.tika.detect.XmlRootExtractor.extractRootElement(XmlRootExtractor.java:41)
      at org.apache.tika.mime.MimeTypes.getMimeType(MimeTypes.java:192)
      at org.apache.tika.mime.MimeTypes.detect(MimeTypes.java:439)
      at org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:61)
      at org.apache.tika.cli.TikaCLI$10.process(TikaCLI.java:252)
      at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:417)
      at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:111)
      

      Attachments

        1. NUTCH-2223.patch
          1 kB
          Tien Nguyen Manh

        Issue Links

          Activity

            People

              markus17 Markus Jelsma
              tiennm Tien Nguyen Manh
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: