Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-4232

Spaces getting added in between a word in scanned documents

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Cannot Reproduce
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Text extraction
    • Labels:
      None

      Description

      As a consumer of this API I am facing issue when i am trying to extra the PDf which are scanned.

      In the extracted output can see spaces between of and few places to ave 2 spaces added in between two words.

      Following are the examples 

      In the below example space is getting added between of

      In PDF:

      is made as of October 13,2015 between XYZ and ABC.

      After extraction:

      is made as o f October 13,2015 between XYZ and ABC.

      Even, in the below example two spaces are getting added in after

      In PDF: WHEREAS, Navigation

      After extraction: W h e r e a s ,  Navigation

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              nwawre@brightleaf.com Niyati wawre
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: