[PDFBOX-612] Unknown encoding for 'GBK-EUC-H' - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 0.8.0-incubator
Fix Version/s: 1.5.0
Component/s: PDModel
Labels:
- encoding
Environment:
Windows

Description

Unknown encoding for 'GBK-EUC-H' for chinese pdf document. To fix it.

1.add method to org.apache.pdfbox.pdmodel.font.PDFont.java

public String getEncodingName() {
COSBase encoding = font.getDictionaryObject(COSName.ENCODING);
if (encoding != null) {
if (encoding instanceof COSName)

{ return ((COSName) encoding).getName(); }

}
return null;
}

2.modify encode method.
from
if( retval == null && cmap != null )

{ retval = cmap.lookup( c, offset, length ); }

//if we havn't found a value yet and
//we are still on the first byte and
//there is no cmap or the cmap does not have 2 byte mappings then try to encode
//using fallback methods.

if( retval == null && cmap != null )
{
String encodingStr = getEncodingName();
if (encodingStr != null) {
EncodingConverter converter = EncodingConversionManager.getConverter(encodingStr);
if (converter != null)

{ if (length == 1) return null; retval = converter.convertBytes(c, offset, length, cmap); }

else

{ retval = cmap.lookup( c, offset, length ); }

} else

{ retval = cmap.lookup( c, offset, length ); }

}
//if we havn't found a value yet and
//we are still on the first byte and
//there is no cmap or the cmap does not have 2 byte mappings then try to encode
//using fallback methods.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

1DE9A100d01.pdf
25/Feb/11 06:58
12 kB
Gang Luo
PDFBOX612-1DE9A100d01.txt
03/Mar/11 17:08
4 kB
Andreas Lehmkühler
PDFBOX612-1DE9A100d011.png
03/Mar/11 17:08
573 kB
Andreas Lehmkühler

Activity

People

Assignee:: Andreas Lehmkühler

Reporter:: Gang Luo

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 08/Feb/10 00:34

Updated:: 18/May/12 15:31

Resolved:: 18/May/12 15:31