Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-4232

Spaces getting added in between a word in scanned documents

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Cannot Reproduce
    • None
    • None
    • Text extraction
    • None

    Description

      As a consumer of this API I am facing issue when i am trying to extra the PDf which are scanned.

      In the extracted output can see spaces between of and few places to ave 2 spaces added in between two words.

      Following are the examples 

      In the below example space is getting added between of

      In PDF:

      is made as of October 13,2015 between XYZ and ABC.

      After extraction:

      is made as o f October 13,2015 between XYZ and ABC.

      Even, in the below example two spaces are getting added in after

      In PDF: WHEREAS, Navigation

      After extraction: W h e r e a s ,  Navigation

      Attachments

        Activity

          People

            Unassigned Unassigned
            nwawre@brightleaf.com Niyati wawre
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: