Patch, adding embedded PDF handling to ExtractText, plus a test case
(and test document).
I would really appreciate someone who's more familiar with PDFBox's
APIs having a look at what I did... I had to dig into various classes
that I don't really understand: PDDocumentCatalog,
I only extract text for embedded PDFs but not other content-types.
I noticed Tika's parser also fails to visit embedded documents within
a PDF... I'll open a separate issue.