Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.8.0-incubator
    • Fix Version/s: 1.0.0
    • Component/s: FontBox, PDModel
    • Labels:
      None

      Description

      PDFBox should support embedded font types, most prominently the Adobe CFF/Type2 (aka Type1C) font type. The desired functionality includes both glyph metrics (for PDF text extraction using org.apache.pdfbox.util.PDFTextStripper) and glyph painting (for PDF rendering using org.apache.pdfbox.pdfviewer.PageDrawer).

      I have implemented the basics of Adobe CFF/Type2 font specification. If the other project members find my work substantial, I would like to see it incorporated into FontBox/PDFBox projects. Please see the attached patch files.

      Design considerations. A PDF FontFile3 stream can be parsed into CFFFont objects by class CFFParser. CFFFont contains a map of glyph names to Type2 charstrings, which can be converted to Type1 charstrings by class CharStringConverter and rendered by class CharStringRenderer. Glyph metrics is attained by formatting the result as AFM by class AFMFormatter, which plugs nicely with existing PDFBox infrastructure. Glyph painting is attained by formatting the result as PostScript Type1 font by class Type1FontFormatter, which can be loaded via java.awt.Font#createFont(int, InputStream).

      The current implementation does not support synthetic CFF fonts nor CID-keyed CFF fonts. Also, the conversion of certain Type2 features (stemming, hinting, flex) is missing.

      1. without-cff.png
        140 kB
        Villu Ruusmann
      2. with-cff.png
        163 kB
        Villu Ruusmann
      3. fontbox-r818793.patch
        123 kB
        Villu Ruusmann
      4. pdfbox-r823839.patch
        19 kB
        Villu Ruusmann

        Issue Links

          Activity

          Hide
          Villu Ruusmann added a comment -

          Patch files against the head revisions of trunk branches as of October 16, 2009.

          The code makes use of Java 1.5 language features, which is in conflict with current FontBox/PDFBox project style.

          Show
          Villu Ruusmann added a comment - Patch files against the head revisions of trunk branches as of October 16, 2009. The code makes use of Java 1.5 language features, which is in conflict with current FontBox/PDFBox project style.
          Hide
          Villu Ruusmann added a comment -

          Captured the output of org.apache.pdfbox.pdfviewer.PageDrawer before (without-cff.png) and after (with-cff.png) applying the patches.

          Show
          Villu Ruusmann added a comment - Captured the output of org.apache.pdfbox.pdfviewer.PageDrawer before (without-cff.png) and after (with-cff.png) applying the patches.
          Hide
          Villu Ruusmann added a comment -

          The idea of converting Adobe CFF/Type2 fonts to PostScript Type1 fonts has been aired before:
          http://www.mail-archive.com/pdfbox-dev@incubator.apache.org/msg01674.html

          Show
          Villu Ruusmann added a comment - The idea of converting Adobe CFF/Type2 fonts to PostScript Type1 fonts has been aired before: http://www.mail-archive.com/pdfbox-dev@incubator.apache.org/msg01674.html
          Hide
          Villu Ruusmann added a comment -

          A few technical observations regarding java.awt.Font#createFont(int, InputStream) that might come in handy for other Type1 font developers.

          Line terminators must be UNIX-style '\n'. For example, one cannot use java.io.PrintStream#println for outputting lines of text, because it emits different line terminators on different platforms. This is issue #6609143 in Java Bug Database:
          http://bugs.sun.com/view_bug.do?bug_id=6609143

          Glyph names must conform to standard encoding. java.awt.Graphics#drawString will not paint a glyph whose name is non-conforming. For example, "zero", "colon", "a" and "A" are all valid glyph names, while "Zcaron", "H22107" and "pi121" are not. The solution is to rename the latter to Unicode glyph names "uniXXXX", where XXXX is the hexadecimal value of the glyph code.

          Show
          Villu Ruusmann added a comment - A few technical observations regarding java.awt.Font#createFont(int, InputStream) that might come in handy for other Type1 font developers. Line terminators must be UNIX-style '\n'. For example, one cannot use java.io.PrintStream#println for outputting lines of text, because it emits different line terminators on different platforms. This is issue #6609143 in Java Bug Database: http://bugs.sun.com/view_bug.do?bug_id=6609143 Glyph names must conform to standard encoding . java.awt.Graphics#drawString will not paint a glyph whose name is non-conforming. For example, "zero", "colon", "a" and "A" are all valid glyph names, while "Zcaron", "H22107" and "pi121" are not. The solution is to rename the latter to Unicode glyph names "uniXXXX", where XXXX is the hexadecimal value of the glyph code.
          Hide
          Andreas Lehmkühler added a comment -

          Sounds really interesting. I'll have a deeper look at it in a few days.

          For now, thanks for the contribution!!

          Is it possible to add the mentioned pdf as well?

          Show
          Andreas Lehmkühler added a comment - Sounds really interesting. I'll have a deeper look at it in a few days. For now, thanks for the contribution!! Is it possible to add the mentioned pdf as well?
          Hide
          Villu Ruusmann added a comment -

          Replaced old patch files with new ones.

          A number of edge and corner cases have been identified and resolved while testing against an extended set of PDF documents.

          Show
          Villu Ruusmann added a comment - Replaced old patch files with new ones. A number of edge and corner cases have been identified and resolved while testing against an extended set of PDF documents.
          Hide
          Andreas Lehmkühler added a comment -

          I've ran a first test and it looks quite good....

          Show
          Andreas Lehmkühler added a comment - I've ran a first test and it looks quite good....
          Hide
          Villu Ruusmann added a comment -

          Fixed the way how a Type1C font program's built-in encoding is overridden by an Encoding entry in the PDF font dictionary.

          Show
          Villu Ruusmann added a comment - Fixed the way how a Type1C font program's built-in encoding is overridden by an Encoding entry in the PDF font dictionary.
          Hide
          Jukka Zitting added a comment -

          The new files you add in the patch come with headers saying "Copyright (c) 2009 Villu Ruusmann". At Apache we prefer not to state individual copyrights in the source files as keeping track of the copyrights becomes quite troublesome when many people modify the same files over time. Instead we opt for a more generic license header as described in http://www.apache.org/legal/src-headers.html. Is it OK for you if we replace your copyright headers with the standard Apache license headers?

          Show
          Jukka Zitting added a comment - The new files you add in the patch come with headers saying "Copyright (c) 2009 Villu Ruusmann". At Apache we prefer not to state individual copyrights in the source files as keeping track of the copyrights becomes quite troublesome when many people modify the same files over time. Instead we opt for a more generic license header as described in http://www.apache.org/legal/src-headers.html . Is it OK for you if we replace your copyright headers with the standard Apache license headers?
          Hide
          Villu Ruusmann added a comment -

          It is OK by me if you want to replace the current copyright notice "Copyright (c) 2009 Villu Ruusmann" with Apache license header. However, I would like to see my name retained as an @author.

          I read through the "ASF Source Header and Copyright Notice Policy" and I'm not sure if I understood it correctly that I must carry out this replacement myself and resubmit the patches?

          Show
          Villu Ruusmann added a comment - It is OK by me if you want to replace the current copyright notice "Copyright (c) 2009 Villu Ruusmann" with Apache license header. However, I would like to see my name retained as an @author. I read through the "ASF Source Header and Copyright Notice Policy" and I'm not sure if I understood it correctly that I must carry out this replacement myself and resubmit the patches?
          Hide
          Andreas Lehmkühler added a comment -

          IMHO, it's sufficient to give us the permission the change the header (I've already started with that), as described as 3. on the page Jukka and you mentioned:

          "Source File Headers for Code Developed at the ASF

          0. This section refers only to works submitted directly to the ASF by the copyright owner or owner's agent.
          1. If the source file is submitted with a copyright notice included in it, the copyright owner (or owner's agent) must either:
          1. remove such notices, or
          2. move them to the NOTICE file associated with each applicable project release, or
          3. provide written permission for the ASF to make such removal or relocation of the notices."

          By using JIRA your permission is directly connected to your patch, which should fullfil our needs to have your permission on the record.

          Show
          Andreas Lehmkühler added a comment - IMHO, it's sufficient to give us the permission the change the header (I've already started with that), as described as 3. on the page Jukka and you mentioned: "Source File Headers for Code Developed at the ASF 0. This section refers only to works submitted directly to the ASF by the copyright owner or owner's agent. 1. If the source file is submitted with a copyright notice included in it, the copyright owner (or owner's agent) must either: 1. remove such notices, or 2. move them to the NOTICE file associated with each applicable project release, or 3. provide written permission for the ASF to make such removal or relocation of the notices." By using JIRA your permission is directly connected to your patch, which should fullfil our needs to have your permission on the record.
          Hide
          Villu Ruusmann added a comment -

          As a rightful copyright owner, I hereby grant the ASF to remove the current copyright notices "Copyright (c) 2009 Villu Ruusmann" and add the standard Apache license headers.

          Show
          Villu Ruusmann added a comment - As a rightful copyright owner, I hereby grant the ASF to remove the current copyright notices "Copyright (c) 2009 Villu Ruusmann" and add the standard Apache license headers.
          Hide
          Andreas Lehmkühler added a comment -

          I've added the fontbox part with version 907461.

          I've replaced the copyright notice with the ASF license as discussed. Villu name retained as author. I've also added a lot of javadocs ...

          Show
          Andreas Lehmkühler added a comment - I've added the fontbox part with version 907461. I've replaced the copyright notice with the ASF license as discussed. Villu name retained as author. I've also added a lot of javadocs ...
          Hide
          Andreas Lehmkühler added a comment -

          I've added the pdfbox part with version 907804.

          I've replaced the copyright notice with the ASF license as discussed. Villus name retained as author. I've also added a lot of javadocs ...

          The hudson build revealed a Java 6 dependency in FontBox:

          [INFO] Compilation failure
          .../src/main/java/org/apache/fontbox/cff/CharStringRenderer.java:[122,12]
          moveTo(float,float) in java.awt.geom.GeneralPath cannot be applied to
          (double,double)
          .../src/main/java/org/apache/fontbox/cff/CharStringRenderer.java:[129,12]
          lineTo(float,float) in java.awt.geom.GeneralPath cannot be applied to
          (double,double)

          We should fix that before the next release.

          Show
          Andreas Lehmkühler added a comment - I've added the pdfbox part with version 907804. I've replaced the copyright notice with the ASF license as discussed. Villus name retained as author. I've also added a lot of javadocs ... The hudson build revealed a Java 6 dependency in FontBox: [INFO] Compilation failure .../src/main/java/org/apache/fontbox/cff/CharStringRenderer.java: [122,12] moveTo(float,float) in java.awt.geom.GeneralPath cannot be applied to (double,double) .../src/main/java/org/apache/fontbox/cff/CharStringRenderer.java: [129,12] lineTo(float,float) in java.awt.geom.GeneralPath cannot be applied to (double,double) We should fix that before the next release.
          Hide
          Andreas Lehmkühler added a comment -

          The java 6 dependencies are removed. I've added some comments to the code to look for after the release.

          At this point everything is done, so that I set this to resolved.

          The CFF/Type2 support is a great improvement of PDFBox. Thanks to Villu for his effort.

          Show
          Andreas Lehmkühler added a comment - The java 6 dependencies are removed. I've added some comments to the code to look for after the release. At this point everything is done, so that I set this to resolved. The CFF/Type2 support is a great improvement of PDFBox. Thanks to Villu for his effort.
          Hide
          Andreas Lehmkühler added a comment -

          closed after releasing version 1.0.0

          Show
          Andreas Lehmkühler added a comment - closed after releasing version 1.0.0

            People

            • Assignee:
              Andreas Lehmkühler
              Reporter:
              Villu Ruusmann
            • Votes:
              1 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development