Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-3796

Content of different table cells concatenated on text extraction in some cases

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Not A Problem
    • Affects Version/s: 2.0.7, 3.0.0 PDFBox
    • Fix Version/s: None
    • Component/s: Text extraction

      Description

      Content of different table cells concatenated on text extraction in some cases.

      Please, see in attachments one of the problematic pdf files and plain text files extracted by PDFBox 2.0.6 and 3.0.0 (trunk)
      Snippet from the extracted text containing concatenated text content of different cells:

      INDIVIDUAL RECSJeanette Bleckley03/17/2017/

        Attachments

        1. fdl_relpub_foi_dailyre0313172017.pdf
          87 kB
          Yauheni Salopiy
        2. fdl_relpub_foi_dailyre0313172017_2.0.6.txt
          43 kB
          Yauheni Salopiy
        3. fdl_relpub_foi_dailyre0313172017_3.0.txt
          43 kB
          Yauheni Salopiy

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              Genstr Yauheni Salopiy
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: