[PDFBOX-922] True type PDFont subclass only supports WinAnsiEncoding (hardcoded!) - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Closed
Priority: Blocker
Resolution: Fixed
Affects Version/s: 1.3.1
Fix Version/s: 2.0.0
Component/s: Writing
Labels:
None
Environment:
JDK 1.6 / OS irrelevant, tried against 1.3.1 and 1.2.0

Description

PDFBox cannot embed Identity-H or Identity-V type TTF fonts in the PDF it creates, making it impossible to create PDFs in any language apart from English and ones supported in WinAnsiEncoding. This behaviour is caused because method PDTrueTypeFont.loadTTF has hardcoded WinAnsiEncoding inside, and there is no Identity-H or Identity-V Encoding classes provided (to set afterwards via PDFont.setFont() )

This excludes the following languages plus many others:

Greek
Bulgarian
Swedish
Baltic languages
Malteze

The PDF created contains garbled characters and/or squares.

Simple test case:

                PDDocument doc = null;
		try {
			doc = new PDDocument();
			PDPage page = new PDPage();
			doc.addPage(page);
			// extract fonts for fields
			byte[] arialNorm = extractFont("arial.ttf");
			//byte[] arialBold = extractFont("arialbd.ttf"); 
			//PDFont font = PDType1Font.HELVETICA;
			PDFont font = PDTrueTypeFont.loadTTF(doc, new ByteArrayInputStream(arialNorm));
			
			PDPageContentStream contentStream = new PDPageContentStream(doc, page);
			contentStream.beginText();
			contentStream.setFont(font, 12);
			contentStream.moveTextPositionByAmount(100, 700);
			contentStream.drawString("Hello world from PDFBox ελληνικά"); // text here may appear garbled; insert any text in Greek or Bulgarian or Malteze
			contentStream.endText();
			contentStream.close();
			doc.save("pdfbox.pdf");
			System.out.println(" created!");
		} catch (Exception ioe) {
			ioe.printStackTrace();
		} finally {
			if (doc != null) {
				try { doc.close(); } catch (Exception e) {}
			}
		}

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

pdfbox-unicode.diff
03/Jun/14 14:03
11 kB
Antti Lankila
pdfbox-unicode2.diff
03/Jun/14 17:05
14 kB
Antti Lankila

Issue Links

duplicates

PDFBOX-553 writing pdf file in Japanese, garbled

Closed

is depended upon by

PDFBOX-283 Character encoding/appearance issues when filling forms

Closed

PDFBOX-1441 Convert text to pdf in polish (pl_PL) grabbled

Closed

PDFBOX-903 Unicode text getting mangled via TextToPDF + PDFTextStripper

Closed

PDFBOX-1242 Handle non ISO-8859-1 chars with drawString

Closed

is duplicated by

PDFBOX-1569 Chinese, Korean - MultiByte Character displayed incorrectly

Closed

PDFBOX-1071 Can not generate chinese character PDF file

Closed

PDFBOX-1705 can not Write Hebrew and Chinese word into a PDF

Closed

PDFBOX-1785 Print the chinese character

Closed

PDFBOX-212 PDF Document cut German Umlauts

Closed

relates to

PDFBOX-2565 Subset embedded TTF fonts

Closed

supercedes

PDFBOX-903 Unicode text getting mangled via TextToPDF + PDFTextStripper

Closed

(5 is duplicated by, 1 relates to, 1 supercedes)

Activity

People

Assignee:: Unassigned

Reporter:: Thanos Agelatos

Votes:: 16 Vote for this issue

Watchers:: 18 Start watching this issue

Dates

Created:: 15/Dec/10 21:42

Updated:: 29/Mar/16 11:01

Resolved:: 12/Dec/14 22:09