Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-1228

Embedded files not extracted properly from PDF

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.4
    • 1.5
    • parser
    • CentOS 6.5 VM

    Description

      IAW pdfbox example here:

      http://svn.apache.org/repos/asf/pdfbox/trunk/examples/src/main/java/org/apache/pdfbox/examples/pdmodel/ExtractEmbeddedFiles.java

      the PDF parser does not check for additional entries under Kids node when Names node does not exist.

      Attachments

        1. pdf_with_doc_and_text_attached.pdf
          2.21 MB
          Jason Sherman

        Activity

          People

            Unassigned Unassigned
            agi20dla Jason Sherman
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: