Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-389

Garbled metadata when dealing with encrypted PDF files.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • 0.6
    • 0.9
    • metadata, parser
    • None
    • Windows 7 64-bit

    Description

      The code exhibiting this issue is very simple:

      InputStream input = new FileInputStream(file);
      ContentHandler textHandler = new BodyContentHandler();
      tikaParser.parse(input, textHandler, metadata);
      input.close();
      System.out.println(metadata);

      The output:
      title=?a?▬÷&▼♂?ŢjK???ž?↑M?A→<═]1
      =╬\bK Author=═g?═?♦ Content-Type=application/pdf creator=?k?═?♦Ý`;Ý?)?Ě?3n
      Î☼46ËO

      Other than that, the extracted text is 100% correct.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              socketbind Gabriel Miklos
              Votes:
              1 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: