As described in this stackoverflow-post i'm having troubles extracting text out of scanned PDF files. By scanned PDF files i mean PDF files that consist only of images. Because each page is an image i can't extract them using a custom ParsingEmbeddedDocumentExtractor. I also tried using the setExtractInlineImages method of the PDFParserConfig but this didn't work aswell.
There was already a ticket regarding the OCR support and including the PDF file i'm using for my tests.
Here is a JUnit-test about my issue: