Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-4480

Problem extracting text in newline characters and spaces beetween words

    XMLWordPrintableJSON

Details

    • Important

    Description

       

      I have a PDF file , when I try to extract its text using

      It ignores some Enter characters between lines, so the last word in the line and the first word in the next line appear as 1 word without spaces between them !!

      For Example, In Attached Pdf

      main Bsk as mainBsk

      narasimhan1989@gmail.com Bangalore as narasimhan1989@gmail.comBangalore

      Attachments

        1. PDFBOX-4480-huge-CapHeight.pdf-sorted.txt
          5 kB
          Tilman Hausherr
        2. PDFBOX-4480-huge-CapHeight.pdf.txt
          5 kB
          Tilman Hausherr
        3. Narasimhan S.pdf
          87 kB
          ANIL SANGHANI
        4. Document.txt
          5 kB
          ANIL SANGHANI

        Issue Links

          Activity

            People

              tilman Tilman Hausherr
              ANIL SANGHANI ANIL SANGHANI
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: