Apache OpenOffice (AOO) Bugzilla – Issue 63015
PDF-Export and Type1-Fonts: Error exporting umlaut
Last modified: 2020-07-05 11:49:05 UTC
Hello! You can export writer-documents to PDFs with Type1-Fonts in OO 2.0.2 (see issue 62307). But this doesn't seem to work with German umlauts like äöü. I tested it only in the us-englisch Version, but since the umlauts are displayed correctly in the document they should appear correctly in the PDF as well. I attached a sample-document and the resulting PDF. Greetings Martin
Created attachment 34732 [details] Original dokument with some umlauts
Created attachment 34733 [details] Resulting PDF with broken umlauts
Reassigned to HI.
Confirmed it with the German version as well (2.0.2)
The root cause of the problem is that the font "TheMixExtraBold" doesn't contain these umlauts. This is rather unfortunate considering that the font seems to be designed in Germany. The best workaround for problems with fonts not supporting characters you want to use is to select another font instead which supports the text. Though OOo contains rescue mechanisms for situations like this the chances that the missing character is replaced by a by using the corresponding character from another font which matches the style of the selected font 100% are pretty dim. Especially when the style of the selected font is rather unique. OOo's heuristic to deal with situations with fonts not supporting some characters usually works quite well for truetype or opentype fonts. Also for Type1 fonts on unix platforms. OOo currently cannot detect whether a character is supported in a Type1 font on windows platforms, so this is the worst case situation for OOo. Maybe we can improve the heuristic for Type1 fonts on windows.
Hi hud and thanks for your reply. The font used ("TheMixExtraBold") in the example definitely does contain the umlauts. I see them in Writer when selecting "TheMixExtraBold" and type them, and because of the shape it's only this font the umlauts can come from. So IMHO the umlauts are really inside the font. If OpenOffice-Writer can display those characters without any problems they should be exported to pdf correctly. Unfortunatly this font is licensed so I'm not able to provide a copy of it as a font-file.
> I see them in Writer when selecting "TheMixExtraBold" and type them ATM is responsible for rastering and displaying Type1 fonts on windows. When ATM sees that a glyph is empty it seems to do something like glyph fallback. For OOo Window's GDI subsystem looks a black box. OOo doesn't and shouldn't know what the different Window's subsystems do under the hood. OOo doesn't implement DMA access to the harddisks either... > So IMHO the umlauts are really inside the font. I know the font, the umlauts are not in there. Have a look with a good font viewer.
Thanks again for you reply! I think I learned something new (never thought GDI would be capable to "simulate" umlauts).
I just was able to create a pdf containing TheMixExtraBoldCaps-umlauts with GhostScript and a Windows PS-Printer. Perhaps this might give you a glue how to solve the problem.
This is interesting. Can you attach the corresponding PDF from ghostscript? I just analyzed this problem some more with our PDF expert. I looks like the deeper problem is that the font claims it has StandardEncoding, which doesn't contain some characters like umlauts or others symbols but it has them. So OOo does embed the font and the correct codes for umlauts into the PDF, but for some reason it doesn't come together. If there is a workaround for this situation it has to be handled by our PDF export. Reassigning to PL.
Created attachment 36547 [details] PDF created with GhostScript which contains the umlauts correctly.
*** Issue 64070 has been marked as a duplicate of this issue. ***
Looking at attached PDF which was produced by ghostscript I see that the font is reencoded/subsetted with a WinAnsiEncoding, which covers the umlauts...
target
I tested with some other PostScript font (that has umlaut support) and version 2.0.3 (win32 platform) and I can only confirm the behaviour that PDF Export does not export these Umlaut and more language specific characters: ÄÜÖäüöß A workaround is to install a pdf-printer like PDFCreator. It does export a really nice and much smaller PDF from the same document.
I have a problem that may or may not be related: I make a document with form fields and I export it to PDF, and then when filling in the form with Adobe Reader, any text fields will show black dots instead of some Czech accented characters. Curiously, some Czech characters work, but some don't. Is this related? Should I file a new bug for this?
vdvo: that is basically issue 42985 (for which there is no solution yet). The characters that do not work are most certainly those not in the WinAnsiEncoding.
Created attachment 38358 [details] another writer file in Swedish. having the same problem
Created attachment 38359 [details] the resulting PDF, the Swedish letters with character missing.
It appears that the document I receveid from a user suffer of the same problem. Unfortunately on Linux FC5 I could reproduce it, because the AGaramond Type1 used gets converted to Times New Roman and the error disappear. Strange that AGaramond is missing the umlaut though. CC myself.
I should have said "...on Linux FC5 I could not reproduce it...
*** Issue 73347 has been marked as a duplicate of this issue. ***
prio
*** Issue 75707 has been marked as a duplicate of this issue. ***
pl->hdu: the real problem here is that the PDF code cannot know the real codon for these characters due to the a little "simplistic" implementation of WinSalGraphics::GetFontEncodingVector. This method should output the non encoded (in the standard encoding) characters and their adobe name, but it doesn't so the PDF code cannot contain them. However I don't know whether you have a chance to get those pairs on Windows. Please have a look.
@pl: we could cook up a simple parser for the pfb's eexec section...
retargeted to 2.4
Decrypting and parsing the Type1 eexec string isn't implemented yet and this will take a while. A workaround for psprint till this happens would be to use the adobe glyph names for them.
*** Issue 107264 has been marked as a duplicate of this issue. ***
Please allow me to add my 2cents: You don't have to decode the encrypted part of the type1 font. I'm not aware of any quality font without glyphs for umlauts, accented characters, etc. (except Symbol Fonts), so you won't get any helpful information by decoding the font. In fact you have to specify the encoding (the mapping of font glyph names to characters). In the ghostscript example this is done with the tag "/Encoding /WinAnsiEncoding" within the Font object. In the broken example this tag is missing. And as the PDF-Reader has no idea, that the character 246 (ö) should be represented by the glyph /oumlaut it is shown as space. For Windows and ISO8859-1 character sets on Unix-Systems this encoding should work fine, if you need eastern european characters as defined in ISO8859-2, you will have to specify your own encoding vector, as there are no predefined vectors in PDF for these character sets. I hope this helps.
I have this problem too for a font that I just bought (Helvetica Neue LT, windows postscript version). There is no problem with e.g., Nimbus Sans. Is more information needed to solve the issue? Looking at the PDF file, there is a difference in how Nimbus Sans and Helvetica Neue are embedded. For the first, the umlauted characters are embedded separately in addition to the complete set. The separate embedding looks like this: 54 0 obj <</Type/Encoding/Differences[ 0 /Udieresis /Aring /Adieresis /Odieresis /aring /adieresis /odieresis]>> endobj [...] 56 0 obj <</Type/Font/Subtype/Type1/BaseFont/NimbusSanL-Regu /Encoding 54 0 R /ToUnicode 55 0 R /FirstChar 0 /LastChar 6 /Widths[722 667 667 778 556 556 556 ] /FontDescriptor 51 0 R>> endobj There is no such handling of the umlauted characters for Helvetica Neue. Comparing the .afm files, Nimbus Sans has, for example: C -1 ; WX 556 ; N aring ; B 42 -23 535 754 ; and Helvetica Neue has: C 229 ; WX 574 ; N aring ; B -7 -14 522 778 ; If you need more info, I'd be happy to provide!
Some more experimenting based on astumpf's comment: In the PDF file I manually added: /Encoding/WinAnsiEncoding/Subtype/Type1 to the font object like this: 17 0 obj <</Type/Font/Subtype/Type1/BaseFont/HelveticaNeueLT-Italic /Encoding/WinAnsiEncoding/Subtype/Type1 /ToUnicode 16 0 R /FirstChar 0 /LastChar 255 /Widths[0 0 0 0 0 222 222 0 222 0 222 222 222 222 0 0 0 0 0 0 0 0 0 0 0 167 519 519 556 222 611 444 278 259 426 556 556 926 630 278 259 259 352 600 278 389 278 333 556 556 556 556 556 556 556 556 556 556 278 278 600 600 600 556 800 667 685 722 704 611 574 759 722 259 519 667 556 870 722 759 648 759 685 648 574 722 611 926 611 611 611 259 333 259 600 500 222 519 593 537 593 537 296 574 556 222 222 481 222 852 556 574 593 593 333 481 315 556 481 759 481 481 444 333 222 333 600 0 556 0 278 556 426 1000 556 556 222 1074 648 259 1074 0 0 0 0 278 278 426 426 500 500 1000 222 990 481 259 907 0 0 611 278 259 556 556 556 556 222 556 222 800 311 463 600 600 800 222 400 600 333 333 222 556 600 278 222 333 344 463 834 834 834 556 667 667 667 667 667 667 926 722 611 611 611 611 259 259 259 259 704 722 759 759 759 759 759 600 759 722 722 722 722 611 648 537 519 519 519 519 519 519 870 537 537 537 537 537 222 222 222 222 574 556 574 574 574 574 574 600 574 556 556 556 556 481 593 481 ] /FontDescriptor 15 0 R>> endobj and it worked!!!
Adding CCs.
After digging through PDF documentations, I found a pretty easy solution (for PDF-1.4). My Type 1 font is defined in the pdf file like this: 9 0 obj <</Type/Font/Subtype/Type1/BaseFont/Syntax /ToUnicode 8 0 R /FirstChar 0 /LastChar 255 /Widths[ .... When adding "/Encoding /WinAnsiEncoding" to it so that it becomes 9 0 obj <</Type/Font/Subtype/Type1/BaseFont/Syntax /Encoding /WinAnsiEncoding /ToUnicode 8 0 R /FirstChar 0 /LastChar 255 /Widths[ ... the umlauts show up. So please can we add this to Type1 Font created PDF files? According to PDF Reference, Third Edition, version 1.4 linked here http://www.adobe.com/devnet/pdf/pdf_reference_archive.html on Page 317-318 "Entries in a Type 1 font dictionary" it says for /Encoding: (Optional) A specification of the font’s character encoding, if different from dictionary its built-in encoding. The value of Encoding may be either the name of a predefined encoding (MacRomanEncoding, MacExpertEncoding, or WinAnsiEncoding, as described in Appendix D) or an encoding dictionary that specifies differences from the font’s built-in encoding or from a specified predefined encoding (see Section 5.5.5, “Character Encoding”). So as long as no encoding is set through other constraints "/Encoding /WinAnsiEncoding" could be set. My font itself has "StandardEncoding" as parameter and this means pretty much nothing as described on page 329. Regards Martin
I found the solution to this. In vcl\source\gdi\pdfwriter_impl.cxx at the function emitEmbeddedFont at the Line 3696: if( !pFont->IsSymbolFont() && pEncoding == 0) must be changed to: if( !pFont->IsSymbolFont() ) Reason: Without the pEncoding check - "/Encoding/WinAnsiEncoding\n" is added to the pdf file which is correct. pEncoding specifies that a ToUnicode stream has to be generated (and it is) and nothing speaks against it because it is only a translation table and doesn't affect the encoding itself. For symbolic fonts WinAnsiEncoding would be wrong because they have there own encoding shipped with. I don't want to create a patch and upload this myself because I don't intend to do more bugfixing on openoffice and it is to tiny to go through the whole upload process. So please someone else do this, I don't want any rights on that code submission.
"hdu" committed SVN revision 1631975 into trunk: #i63015# always default to WinAnsiEncoding for non-symbol PDF-Type1 export
Many thanks for debugging into it and pointing out the problematic source line. Sorry that the review took so long.
FINALLY fixed! We should include it 4.1.2.
Accepted for 4.1.2.
"kschenk" committed SVN revision 1705192 into branches/AOO410: #i63015# Merged from trunk r 1631975
It would be great if one of the many people who are following this issue could take the time to download OpenOffice 4.1.2-RC2 (German: https://dist.apache.org/repos/dist/dev/openoffice/4.1.2-rc2-r1707648/binaries/de/ ) and comment on whether the bug is now fixed for the upcoming OpenOffice version 4.1.2. Thanks!
Can't verify since (on Linux) the sample document ("Original dokument..." in the Attachments above) already gets converted to PDF correctly with older versions of OpenOffice, such as 4.1.0. Still, it does work with 4.1.2-RC2 too.
I just downloaded 4.1.2 Build 9781 Rev.1707648 testet it and it works. Thanks for fixing it. We personally switched the problematic font to an opentype font. As remark for further development, the LOO community have merged the pdf creation process for all platforms in one compomnent and then the fix doesn't work out anymore for a linux constellation, see here: http://cgit.freedesktop.org/libreoffice/core/commit/?id=297b22bd49ea11a90063ab8503fb83090f351668
@edv: Thank for verification! Marking VERIFIED.
Closing.