Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-5415

Infinite loop in ExtractText in 2.x branch on a specific pdf

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 2.0.26
    • None
    • Parsing
    • None

    Description

      DavidAvant reported an infinite loop in Tika and provided an example file. I can reproduce this with the latest PDFBox app 2.0.26-SNAPSHOT's ExtractText.

      File: https://issues.apache.org/jira/secure/attachment/13042292/map.pdf

      Adobe and a slightly out of date pdftotext also have problems with this file.

      Attachments

        1. PDFBOX-5415-TIKA-3718-p10.pdf
          251 kB
          Tilman Hausherr

        Issue Links

          Activity

            People

              Unassigned Unassigned
              tallison Tim Allison
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: