Description
This patch modifies the PDF to HTML conversion in order to add style information (bold, italic and size font) in the resulting file. Moreover, we have deleted the "DOCTYPE" header because some parsers throws the following exception:
[Fatal Error] loose.dtd:31:3: The declaration for the entity "HTML.Version" must end with '>'.
org.xml.sax.SAXParseException: The declaration for the entity "HTML.Version" must end with '>'.
Attachments
Attachments
Issue Links
- breaks
-
PDFBOX-1860 HTML converter escapes formatting close tags
- Closed
- is duplicated by
-
PDFBOX-213 Text Extraction with Formatting
- Closed