Public signup for this instance is disabled. Go to our Self serve sign up page to request an account.
Share this issue
On TIKA-2045, a user posted a single page document that leads to OOM with -Xmx1g. I confirmed this with PDFBox's ExtractText.
Might be a memory leak with the fonts? See this for some diagnostics I did.
2.0 much slower than 1.8 for text extraction with certain PDF files
TIKA crashes / runs out of memory on simple PDF