Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-2376

Small regression in text extraction with PDFBox 1.8.7 vs. 1.8.6

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • 1.8.7, 1.8.8, 2.0.0
    • 1.8.8, 2.0.0
    • Parsing

    Description

      On at least one file in govdocs1, less text is being extracted with PDFBox 1.8.7 than was extracted with 1.8.6. When running the app.jar with ExtractText, 1.8.7 is not extracting:

      Designated Counties
      No Designation
      Individual Assistance
      All counties are eligible
      ITS Mapping & Analysis CenterWashington, DC
      05/09/08 -- 09:36 AM EDT
      Source: Disaster Federal Registry Notice05/08/2008
      Location Map
      MapID 196d109cd27
      for Hazard Mitigation
      
      

      from govdocs1's 894770.pdf.

      Attachments

        1. 466070.pdf
          338 kB
          Tim Allison
        2. 894770.pdf
          360 kB
          Tim Allison
        3. PDFBOX-2376-179204-EMC.pdf
          715 kB
          Tilman Hausherr
        4. PDFBOX-2376-908436.pdf
          15 kB
          Tilman Hausherr

        Issue Links

          Activity

            People

              tilman Tilman Hausherr
              tallison Tim Allison
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: