Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-3058 Support TIKA Migration to PDFBox 2.0
  3. PDFBOX-3119

Text extraction partially garbled in this file, was OK in 1.8

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Closed
    • Major
    • Resolution: Duplicate
    • 2.0.0
    • None
    • Text extraction
    • None

    Description

      Don't know if this is the same as in PDFBOX-3066. The text with the F4 font is extracted as garbage in 2.0, and is fine with AR and in 1.8.

      Attachments

        1. PDFBOX-3119-545359-p2.pdf
          77 kB
          Tilman Hausherr
        2. PDFBOX-3119-545359-p2-18.txt
          2 kB
          Tilman Hausherr
        3. PDFBOX-3119-545359-p2-20.txt
          2 kB
          Tilman Hausherr

        Issue Links

          Activity

            People

              tilman Tilman Hausherr
              tilman Tilman Hausherr
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: