Details
-
Bug
-
Status: Closed
-
Minor
-
Resolution: Fixed
-
1.8.3
-
None
Description
Bug introduced by PDFBOX-1213 in 1.8.3 for HTML style information.
Bold style tags are opened correctly, but the close tags are html-escaped.
~/work/pdfbox ((1.8.3))$ java -jar app/target/pdfbox-app-1.8.3.jar ExtractText -html -nonSeq -console pdftest.pdf <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> <html><head><title>1725.PDF</title> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> </head> <body> <div style="page-break-before:always; page-break-after:always"><div><p>E:\M55\!\1725.fm 2003-01-01 18:15 P Tagg, IPM, University of Liverpool </p> <p><b>A VERY SMALL PDF FILE </b></p> <p><b>A VERY SMALL PDF FILE </b></p> <p><b>A VERY SMALL PDF FILE </b></p> <p><b>A VERY SMALL PDF FILE </b></p> <p><b>A VERY SMALL PDF FILE </b></p> <p><b>A VERY SMALL PDF FILE</b></p> </div></div> </body></html>
Attachments
Attachments
Issue Links
- is broken by
-
PDFBOX-1213 Adding style information to the PDF to HTML converter
- Closed