Details
-
Bug
-
Status: Closed
-
Minor
-
Resolution: Fixed
-
None
-
None
-
None
Description
I have a test document (same one on PDFBOX-1129), which when run through ExtractText -html, extracts the page number for each page, however in each case the page number looks like:
<p>N<p>Text of page N...
Ie, the <p> tag for the page number wasn't closed.
Maybe related: if I run ExtractText without html, there is not space after the page number and before the next word, ie I see words like 1Massachusetts, 2Course, 3also, 4the.
Attachments
Attachments
Issue Links
- relates to
-
PDFBOX-2160 PDFTextStripper doesn't always write paragraph start
- Closed