Details
-
Bug
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
1.18
-
None
-
None
Description
When using `java -jar pdfbox-app-2.0.9.jar ExtractText -html 10fu7MbhtFooYKpV2M9XBW.pdf result.html` the bold text appears inside of "<b>" tags, meanwhile HTML produced by Tika Server 1.18 has those tags omitted. Is it something expected, any way to match the results with PDFBox?
Sample PDF attached, question is about the first line with "Exhibit 10.2".