Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-1781

Tika generates broken XML file

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.10
    • None
    • general
    • None

    Description

      [ruby & tika-server-1.10] The PDF file: http://ratsinfo.dresden.de/getfile.php?id=52546&type=do will be converted to a xml-file, that contains the full converted text + meta and XML structure two times. Thats out of the XML spec an my following xml parser crash.

      I tried also givemetext.okfnlabs.org, which uses Tika-server + OCR, with this file and this prints nothing out of the file.

      Thousands of other files are correct converted, but not this one.

      Attachments

        Activity

          People

            Unassigned Unassigned
            tranquillo tranquillo
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: