Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
2.0.20
-
None
-
Windows10, 64bit
Description
The IOException exception occurs when attached pdf feeded into PDFBox.
The attached pdf (JP.pdf) file include Adobe-Japan1-65534 cmap.
source code is as below.
—
import javax.imageio.ImageIO; import org.apache.commons.io.FileUtils; import org.apache.pdfbox.pdmodel.PDDocument; import org.apache.pdfbox.pdmodel.PDPage; import org.apache.pdfbox.rendering.ImageType; import org.apache.pdfbox.rendering.PDFRenderer; import org.apache.pdfbox.text.PDFTextStripper; import org.apache.pdfbox.text.TextPosition; public class pdfBoxTest { public static void main(String[] args) throws Exception { pdfBoxTest sample = new pdfBoxTest(); String pdfname = "D:/tmp/jp.pdf"; File pdf = FileUtils.getFile(pdfname); sample.extractTextFromPDF(pdf); sample.load(pdf); } public void load(File pdf) throws Exception { PDDocument document = PDDocument.load(pdf); PDFRenderer renderer = new PDFRenderer(document); BufferedImage bufImage = renderer.renderImageWithDPI(0, 300, ImageType.RGB); ImageIO.write(bufImage, "jpg", new File("D:/tmp/jp.jpg")); } }
getExternalCMap mehod in CMapParse.class tries to find external CMap, but
it couldn't find Japan1-65534 and throws exception.
I know that there is no such a CMap, but it is no problem to open this PDF file,
so I think it is better not to throw exception and use another CMap.
I modified source code as below temporarily. it works well.
protected InputStream getExternalCMap(String name) throws IOException { InputStream is = this.getClass().getResourceAsStream(name); if(is == null) { if(name.startsWith("Adobe-Japan1")) { name = "Adobe-Japan1-1"; } else if(name.startsWith("Adobe-Korea1")) { name = "Adobe-Korea1-1"; } is = this.getClass().getResourceAsStream(name); if(is == null) { throw new IOException("Error: Could not find referenced cmap stream " + name); } } return is; }
But it is not essential one.
If possiblećI would like to ask you to modify source code not to throw exception if
it cannot find Cmap.
I found another Korean pdf file, it includes Adode-Korea1-3 Cmap.
Please refer to attached file.
Thanks!
//Okada