Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-1999

org.apache.tika.sax.ToXMLContentHandler$ElementInfo.getPrefix(ToXMLContentHandler.java:58)

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.13
    • 1.14, 2.0.0
    • parser
    • None
    • Ubuntu 16.04 (64 bit)
      Oracle Java 1.8.0_91-b14 (64 bit)

    Description

      When trying to read the following PDF document:

      http://www.arcadiz.com/content/assets/Artikel_CloudWorks_Vernieuwingen_zorg_vragen_om_veel_snellere_verbindingen.pdf

      TIKA crashes for me with a java.lang.StackOverflowError, caused by a large number of recursion in:

          at org.apache.tika.sax.ToXMLContentHandler$ElementInfo.getPrefix(ToXMLContentHandler.java:58)
      

      For some reason, the Tika App doesn't exhibit this behavior, but the following MWE exposes the issue for me:

      import java.io.ByteArrayOutputStream;
      import java.io.File;
      import java.io.FileInputStream;
      import org.apache.tika.metadata.Metadata;
      import org.apache.tika.parser.AutoDetectParser;
      import org.apache.tika.parser.ParseContext;
      import org.apache.tika.sax.ToHTMLContentHandler;
      
      public class test
      {
          public static void main(String [] args) throws Exception {
              String p = "/home/eggie/faulty_pdf_document.pdf";
              
              FileInputStream input = new FileInputStream(new File(p));
              AutoDetectParser tk = new AutoDetectParser();
              ByteArrayOutputStream os = new ByteArrayOutputStream();
              ToHTMLContentHandler handler = new ToHTMLContentHandler(os, "UTF-8");
              ParseContext pc = new ParseContext();
              System.out.println("Parsing");
              tk.parse(input, handler, new Metadata(), pc);
          }
      }
      

      Attachments

        Activity

          People

            tallison Tim Allison
            MadEgg Egbert
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: