Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-1912

Figure out how to parse truncated PDFs that were handled by PDFBox 1.8.x but not by 2.0.0

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      While working on TIKA-1285, we found that PDFBox 2.0.0 is not able to handle truncated files as well as PDFBox 1.8.11. Let's figure out how to gain the benefits from 2.0.0 without losing the ability to extract some content from truncated files.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                tallison@apache.org Tim Allison
              • Votes:
                1 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated: