Details
-
Improvement
-
Status: Closed
-
Minor
-
Resolution: Fixed
-
None
-
None
-
None
Description
The PDF parser currently extracts and outputs document content as a single string. PDFBox could be used to support structuring at least down to page and paragraph (not sure how accurate) level.