As an update, I've already cleared the external libraries (see below for JAI) and I'm currently working on embedded resources and test files.
Java Advanced Imaging (JAI) components are included in PDFBox as the Java implementation in externals/jai_*.jar and as the ICC profiles in src/main/resources/Resources/colorspace-profiles. The licensing of these components (the Sun Binary Code License) conflicts with Apache policies, and thus we can't distribute them in Apache releases. I'll start a discussion on the mailing list about what to do with this issue.
For the resources, see the following issues I raised with the Apache legal team about the licensing of specific items we include. My understanding is that all of these should be OK for us to distribute, but it's better to have an official approval.
The test files under pdfbox/trunk/test are an interesting issue. There we have a wide variety of different real-world PDF documents. This is a great test suite, but a bit problematic from a licensing point of view. At least the files in test/encryption, test/input, and test/pdfparser don't seem to come with licensing or copyright information. I'll raise also this issue on the mailing list.