Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-1489

PDF Text extraction without permission

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.7
    • 1.8
    • None
    • None

    Description

      In TIKA-1442 text extraction from files like 717226.pdf that don't have text extraction permission works. The permissions in PDF files are only enforced by the application (i.e. PDFBox), i.e. the text information isn't stored separately in encrypted form.

      PDFBox ExtractText command line does throw an exception.
      So I wonder why TIKA is able to extract text. Either TIKA or the PDFBox call used bypasses the permission checking.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              tilman Tilman Hausherr
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: