Many thanks! I'll upload the fix on our end when I get a chance.
Happy to help.
Given PDFBox 2.0.0 is not out yet, are you open to upgrade Tika code base to support that version of PDFBox (replacing support for PDFBox 1.x)?
Once 2.0 is out, y, I think we'll upgrade pretty quickly. See: TIKA-1285 and
PDFBOX-3058 for our collaboration in support of 2.0 regression testing. My dev branch for the integration with Tika is on github
like extracting XFA text. I can submit a patch for that as well if you are open.
I also noticed that you have some wrappers around Tika more generally. Again, if there's anything that would generally help Tika, please send along. You may want to check out our RecursiveParserWrapper...looks like that has some overlapping functionality with what you're doing.
Happy extraction! Cheers!