Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-3796

Content of different table cells concatenated on text extraction in some cases

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Not A Problem
    • 2.0.7, 3.0.0 PDFBox
    • None
    • Text extraction

    Description

      Content of different table cells concatenated on text extraction in some cases.

      Please, see in attachments one of the problematic pdf files and plain text files extracted by PDFBox 2.0.6 and 3.0.0 (trunk)
      Snippet from the extracted text containing concatenated text content of different cells:

      INDIVIDUAL RECSJeanette Bleckley03/17/2017/

      Attachments

        1. fdl_relpub_foi_dailyre0313172017_3.0.txt
          43 kB
          Yauheni Salopiy
        2. fdl_relpub_foi_dailyre0313172017_2.0.6.txt
          43 kB
          Yauheni Salopiy
        3. fdl_relpub_foi_dailyre0313172017.pdf
          87 kB
          Yauheni Salopiy

        Activity

          People

            Unassigned Unassigned
            Genstr Yauheni Salopiy
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: