Details
-
Wish
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
None
-
None
-
None
Description
I created a custom Docker image using the latest Tesseract release. I came across the tika Dockerfile file which installs the following dependencies:
xfonts-utils
fonts-freefont-ttf
fonts-liberation
ttf-mscorefonts-installer
cabextract
I have not found any documetation yet about those dependencies in https://cwiki.apache.org/confluence/display/tika and https://github.com/apache/tika. I can only guess that those dependencies might impact PDF content handling.