The improved font substitution mechanisms in 2.0 are not quite sufficient to handle all PDFs. Specifically, CJK substitution and substitution of TTF in place of CFF fonts is not possible with the current design.
The CJK problems can be seen in
PDFBOX-2509 and PDFBOX-2563, which does not solve the problem. Additional font API weaknesses can be found in PDFBOX-2578 and PDFBOX-2366. This meta-issue aims to address all of those sub-issues.
The current problems are:
- FontBox does not provide a generic font type, so we have handle TrueTypeFont, CFFFont, and Type1Font separately. This hinders cross-format substitution.
- ExternalFonts has no knowledge of the CIDSystemInfo which is necessary for CJK substitution
- FontProvider contains too much public logic which should be internal to PDFBox, e.g. substitution logic, this makes it brittle and means we won't be able to add additional logic after 2.0 is released, e.g. CJK substitution.
- Too much confusion about the role of ExternalFonts, particularly with regards to mapping of built-in fonts and the definition of substitute vs. fallback font.
- ExternalFonts is a black box: the user cannot tell whether the font returned is an exact match, or a last-resort fallback.
- Confusing font substitution API, users preferred having a flat file format
- PDSimpleFont#getEncoding() can return null for TTFs which use built-in encodings. This has caused a lot of bugs - there must be a better way.
- We still have some confusing names, for example a CustomEncoding is known as a "built-in encoding" in the spec.
- There is no fallback CFF font, we resort to AdobeBlank instead, which has no rendering.