Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-61

Spaces in extracted file

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Blocker
    • Resolution: Fixed
    • None
    • 0.8.0-incubator
    • Text extraction
    • None

    Description

      [imported from SourceForge]
      http://sourceforge.net/tracker/index.php?group_id=78314&atid=552832&aid=1208824
      Originally submitted by nobody on 2005-05-25 16:40.

      In trying to integrate with lucene, I was having
      problems. The Lucene people suggested that I check
      the output of extract utility against one of my test pdf's.
      When I did, I saw spaces placed inside many of the
      words. I was on version 0.7.0. So I downloaded 0.7.1
      and see the same results.

      One of the test files where I see this issue is attached.

      [attachment on SourceForge]
      http://sourceforge.net/tracker/download.php?group_id=78314&atid=552832&aid=1208824&file_id=135995
      Tom_3.pdf (application/pdf), 10145 bytes
      Test pdf file.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              jukkaz Jukka Zitting
              Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: