Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-5595

Slight regression on corrupt bug tracker file

    XMLWordPrintableJSON

Details

    • Task
    • Status: Closed
    • Trivial
    • Resolution: Fixed
    • 2.0.28, 3.0.0 PDFBox
    • 2.0.29, 3.0.0 PDFBox
    • Parsing
    • None

    Description

      I'm not sure this is a regression, and apologies if you already dealt with this before the release of 2.0.28.  Also, as a warning, this file is corrupt.

       

      We used to get more text out of this file in 2.0.27 than we do now in 2.0.28: https://corpora.tika.apache.org/base/docs/bug_trackers/evince/evince-395-0.zip-0.pdf

       

      This file derived from the evince bug tracker, which now eventually links to this issue:

      https://gitlab.freedesktop.org/poppler/poppler/-/issues/323

       

      This image from the poppler issue shows what we get with PDFBox 2.0.28 on the left, and 2.0.27 on the right.

       

      If the decision is "the file is corrupt -> not going to fix", I completely understand.

      Attachments

        Issue Links

          Activity

            People

              lehmi Andreas Lehmkühler
              tallison Tim Allison
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: