[PDFBOX-212] PDF Document cut German Umlauts - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Minor
Resolution: Duplicate
Affects Version/s: 1.2.1
Fix Version/s: None
Component/s: Writing
Labels:
None

Description

[imported from SourceForge]
http://sourceforge.net/tracker/index.php?group_id=78314&atid=552832&aid=1587745
Originally submitted by kajiro on 2006-10-31 01:05.

I use the class TextToPDF for create a PDF Document
from a text file. That operates correctly with a simply
text. But when i use german umlauts in the text like
Ã¤,Ã¶,Ã¼ or Ã the PDF Document cut this letters.

Attached is a sample document contaning four words with
incorrectly umlauts!

[attachment on SourceForge]
http://sourceforge.net/tracker/download.php?group_id=78314&atid=552832&aid=1587745&file_id=200742
bsp.pdf (application/pdf), 958 bytes
Umlauts are incorrect

[comment on SourceForge]
Originally sent by benlitchfield.
Logged In: YES
user_id=601708
Originator: NO

To the anonymous poster, did you mean for both PDF links to be the same?

Ben

[comment on SourceForge]
Originally sent by nobody.
Logged In: NO

For PDF file, which contains accented Latin1
characters:
http://acl.ldc.upenn.edu//P/P06/P06-2052.pdf
I get a u with umlauts converted into "currency1u"
(look at the first name on the first page).

For the following file containing Japanese characters:
http://acl.ldc.upenn.edu//P/P06/P06-2052.pdf
I get error:
java.io.IOException: Unknown encoding for 'H'
I also can't seem to cut and past the form.

Attachments

Issue Links

duplicates

PDFBOX-922 True type PDFont subclass only supports WinAnsiEncoding (hardcoded!)

Closed

Activity

People

Assignee:: Unassigned

Reporter:: Anonymous

Votes:: 1 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 31/Oct/06 09:05

Updated:: 11/Oct/14 15:48

Resolved:: 11/Oct/14 15:48