Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-890

Can't extract text from PDF

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.3.1
    • 1.5.0
    • Text extraction
    • None

    Description

      I have created a simply pdf by using Bullzip PDF printer (virtual Windows printer).
      PDFBOX is not able to parse text from this PDF, it just return some low ascii chars.

      command:
      @java -jar pdfbox-app-1.3.1.jar ExtractText -console test.pdf

      Attachments

        1. PDFBOX-890.patch
          0.8 kB
          Martijn Brinkers
        2. test.pdf
          7 kB
          Igor Spasic

        Issue Links

          Activity

            People

              lehmi Andreas Lehmkühler
              najgor Igor Spasic
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: