PDFBox
  1. PDFBox
  2. PDFBOX-448

Columns in text not extracted separately.

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Text extraction
    • Labels:
      None

      Description

      The paper that is attached to PDFBOX-80 has two columns of text, but the extracted text is not separated by column. Instead it combines the text in each column on each line.

      PDFTextStripper has a notion of columns and "articles / beads", but they are not being used with this file.

      1. WBPaper00003120.pdf
        407 kB
        Arun Rangarajan

        Activity

          People

          • Assignee:
            Unassigned
            Reporter:
            Brian Carrier
          • Votes:
            2 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:

              Development