Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-130

text extraction replaces characters

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Resolution: Cannot Reproduce
    • None
    • None
    • Text extraction
    • None

    Description

      [imported from SourceForge]
      http://sourceforge.net/tracker/index.php?group_id=78314&atid=552832&aid=1423868
      Originally submitted by benlitchfield on 2006-02-03 18:32.

      See 056_PCA119.pdf

      First of all a few minor problems

      • apostrophes replaced by question marks. A global
        replace will of course make nonsense of any true
        question marks in the document
      • the 'fi' in 'glorified' at the end of the main
        article also replaced by '?', as also in at least 2
        other instances
      • 'J' before the author's name is presumably a
        replacement for the non-text character, and should
        probably be replaced by a blank line instead

      Attachments

        Activity

          People

            Unassigned Unassigned
            Anonymous Anonymous
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: