Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-805

Extratced ascii text in CJK document is malformed

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.2.1
    • 1.3.1
    • FontBox
    • None

    Description

      When I run ExtractText with CJK PDF document with ascii text, the only ascii text is malformed. This does not occur in version 1.1.0.
      I can fix it with the attached patch. I attach an example pdf.

      Attachments

        1. cjk.pdf
          3 kB
          Keiji Suzuki
        2. CMapParser.java.patch
          1.0 kB
          Keiji Suzuki
        3. cjk.pdf
          24 kB
          Keiji Suzuki
        4. extracted.txt
          0.4 kB
          Keiji Suzuki

        Activity

          People

            Unassigned Unassigned
            zuki_ebetsu Keiji Suzuki
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: