Details
-
Bug
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
None
-
None
-
None
Description
When a PDF has special characters ("", "=",">", "-"), when the text is extracted from the document, these characters show up with different symbols.
I've attached two PDFs that illustrate the issue differently:
- 625006.pdf has multiple pages. When the text is extracted from a table, certain characters show up as a ? symbol.
- example.pdf is a single page with the same table. When the text is extracted the same characters show up as " or # symbols.
Attachments
Attachments
Issue Links
- is related to
-
NIFI-9647 Add ExtractDocumentText Processor
- Resolved