Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Won't Fix
-
None
-
None
-
None
Description
I have a PDF w/ Arabic font that Tika fails to extract (gets all
gibberish).
Looks like the PDF does not include the separate Unicode text metadata
(hmm: would Tika extract that if it were present?), and copy/paste out
of the PDF also produces gibberish.
To fix this I think we'd somehow have to know the mapping for the
font (this particular font is AXTManal)?
Attachments
Attachments
Issue Links
- is related to
-
TIKA-1337 LanguageProfile for Persian/Farsi
- Resolved