Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-1912

Figure out how to parse truncated PDFs that were handled by PDFBox 1.8.x but not by 2.0.0

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      While working on TIKA-1285, we found that PDFBox 2.0.0 is not able to handle truncated files as well as PDFBox 1.8.11. Let's figure out how to gain the benefits from 2.0.0 without losing the ability to extract some content from truncated files.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              tallison Tim Allison
              Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: