There are issues with some PDF documents when it gets indexed (extracted?). There is an issue being fixed by PDFBOX in the version PDFBox 1.1.0. But Apache solr 1.4.1 doesn't have the latest version of these jars which is causing these failures. We have tika-pareser0.4 in this solr 1.4.1 distribution which has to be updated to 0.9 version.
Reference for the issue and the solution : https://issues.apache.org/jira/browse/PDFBOX-617